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Abstract 

This paper develops mathematical models describing the evolutionary dynamics of both asexually 
and sexually reproducing populations of diploid unicellular organisms. The asexual and sexual life 
cycles are based on the asexual and sexual life cycles in Saccharomyces cerevisiae, or Baker's yeast, 
which normally reproduces by asexual budding, but switches to sexual reproduction when stressed. 
The mathematical models consider three reproduction pathways: (1) Asexual reproduction. (2) 
Self-fertilization (3) Sexual reproduction. We also consider two forms of genome organization. In 
one case, we assume that the genome consists of two multi-gene chromosomes, while in the second 
case we consider the opposite extreme and assume that each gene defines a separate chromosome, 
which we call the multi-chromosome genome. These two cases are considered in order to explore 
the role that recombination has on the mutation-selection balance and the selective advantage 
of the various reproduction strategies. We assume that the purpose of diploidy is to provide 
redundancy, so that damage to a gene may be repaired using the other, presumably undamaged 
copy (a process known as homologous recombination repair). As a result, we assume that the 
fitness of the organism only depends on the number of homologous gene pairs that contain at least 
one functional copy of a given gene. If the organism has at least one functional copy of every gene 
in the genome, we assume a fitness of 1. In general, if the organism has I homologous pairs that 
lack a functional copy of the given gene, then the fitness of the organism is ki. The ki are assumed 
to be monotonically decreasing, so that kq = 1 > ki > K2 > ■ ■ ■ > Kqo = 0- For nearly all of 
the reproduction strategies we consider, we find, in the limit of large A'', that the mean fitness at 
mutation-selection balance is max{2e~'^ — 1, 0}, where N is the number of genes in the haploid set 
of the genome, e is the probability that a given DNA template strand of a given gene produces a 
mutated daughter during replication, and n = Ne. The only exception is the sexual reproduction 
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pathway for the multi-chromosomed genome. Assuming a multiphcative fitness landscape where 
Ki = a' for a E (0, 1), this strategy is found to have a mean fitness that exceeds the mean fitness of 
all of the other strategies. Furthermore, while the other reproduction strategies experience a total 
loss of viability due to the steady accumulation of deleterious mutations once fj, exceeds In 2, no 
such transition occurs in the sexual pathway. Indeed, in the limit as a ^ 1 for the multiplicative 
landscape, we can show that the mean fitness for the sexual pathway with the multi-chromosomed 
genome converges to e~^'^, which is always positive. We explicitly allow for mitotic recombination 
in this work, which, in contrast to previous studies using different models, does not have any 
advantage over other asexual reproduction strategies. The results of this paper provide a basis 
for understanding the selective advantage of the specific meiotic pathway that is employed by 
sexually reproducing organisms. The results of this paper also suggest an explanation for why 
unicellular organisms such as Saccharomyces cerevisiae (Baker's yeast) switch to a sexual mode 
of reproduction when stressed. While the results of this paper are based on modeling mutation- 
propagation in unicellular organisms, they nevertheless suggest that, in more complex organisms 
with significantly larger genomes, sex is necessary to prevent the loss of viability of a population 
due to genetic drift. Finally, and perhaps most importantly, the results of this paper demonstrate a 
selective advantage for sexual reproduction with fewer and much less restrictive assumptions than 
previous work. 

PACS numbers: 87.23.-n, 87.23.Kg, 87.16.Ac 
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I. INTRODUCTION 



The evolution and maintenance of sexual reproduction is regarded as one of the central 
problems of evolutionary biology (Bell 1982; WiUiams 1975; Maynard- Smith 1978; Michod 
1995; Hurst and Peck 1996; Agrawal 2006; Visser and Elena 2007). The various theories for 
the selective advantage for sex fall into one of two general categories: The first category of 
theories argues that sex provides a mechanism to purge deleterious mutations from a genome 
(Kondrashov 1988; MuUer 1964; Bruggeman et al. 2003; Paland and Lynch 2006; Bernstein 
et al. 1984; Michod 1995, Nedelcu et al. 2004; Barton and Otto 2005; Keightley and 
Otto 2006), while the second category of theories argues that sex provides greater genetic 
variability that allows populations to adapt more quickly to changing environments (Bell 
1982; Hamilton et al. 1990; Howard and Lively 1994). 

The first category of theories has two versions: The first version, called the Deterministic 
Mutation Hypothesis, argues simply that sex provides a mechanism for purging deleterious 
mutations from a population, and thereby repairing the germ line (Kondrashov 1988). The 
problem with this theory is that it requires what appears to be an overly restrictive assump- 
tion regarding the dependence of organismal fitness on the number of deleterious mutations 
in the genome: In order for the Deterministic Mutation Hypothesis to hold, the organismal 
fitness must decrease increasingly rapidly with the number of deleterious mutations. This 
is a phenomenon known as synergistic epistasis, and the problem with this assumption is 
that it is not at all clear whether or not it is correct. Furthermore, the theory only works if 
mutation rates are at least one per genome per replication cycle, which is not the case for 
many simpler organisms that are capable of reproducing sexually. 

The second version of the first category of theories argues that sex prevents the accumula- 
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tion of mutations in a finite population. The argument is that a finite, asexually reproducing 
population will steadily accumulate deleterious mutations over time. This phenomenon has 
been termed Muller's Ratchet (MuUer 1964). An alternative view holds that, in a finite 
population, random mutations will lead to the elimination of organisms with the wild-type 
genome. Instead, random associations will be formed between functional and non-functional 
copies of genes at different locations in the genome. This is termed the Hill-Robertson effect, 
and leads to a reduction in fitness. In both interpretations of the consequence of finite pop- 
ulations, sexual reproduction breaks up associations between genes and thereby provides a 
mechanism for restoring mutation-free genomes. This process can slow down or even stop 
Muller's Ratchet, or alternatively, it may greatly mitigate the fitness reduction due to the 
Hill- Robertson effect (Keightley and Otto 2006). The problem with this theory is that it 
relies on the assumption of a finite population, which is often interpreted as meaning that 
the population must be taken to be "small" in some sense. This is an ill-defined term, since 
it is not clear what the cutoff for a "small" population should be (generally this means 
that the population is sufficiently small that there are measurable deviations from infinite 
population behavior, due to significant reductions in genetic variation when compared with 
the infinite population at mutation-selection balance). 

The second category of theories also has two versions: The first version argues that 
sexual reproduction allows a population to adapt more quickly to changing environments 
(Bell 1982). The idea is that sexual reproduction allows for recombination among different 
organisms, and thereby increases the genetic variation of a population. In a dynamic en- 
vironment, this increased variation will increase the chances that some organism has a fit 
genome, thereby leading to faster adaptation (Bell 1982). This theory is sometimes called 
the Vicar of Bray Hypothesis, named after an English cleric who was known for changing 
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his opinion as political circumstances dictated (Bell 1982). 

The second version of this category of theories is known as the Red Queen Hypothesis, 
and states that sexual reproduction evolved as a way for relatively slowly reproducing host 
organisms to survive in a co-evolutionary "genetic arms race" with quickly reproducing 
parasites. This theory derives its name from a character named the Red Queen in Lewis 
Carroll's In the Looking Glass, who states, "It takes all the running you can do to stay in 
one place" (Hamilton et al. 1990). 

While this second category of theories is not necessarily incorrect, it is not clear that 
it offers a single, universal explanation for the evolution and maintenance of sexual repro- 
duction. The reason for this is that there are sexually reproducing organisms that have 
remained essentially unevolved for millions of years in what appear to fairly static environ- 
ments (e.g. sharks and crocodiles). As a result, while sexual reproduction may indeed have 
a selective advantage over asexual reproduction in dynamic environments, it is not clear 
that either a dynamic environment or co-evolutionary dynamics are necessary conditions for 
sexual reproduction to be advantageous over asexual reproduction. 

The question of the evolution and maintenance of sexual reproduction is actually com- 
posed of several questions. These are: (1) How did sex evolve, and what were the evolution- 
ary pressures leading to its emergence? (2) Once sex emerged, what were/are the selective 
advantages leading to its maintenance and ubiquity? (3) Why is there such a large variety 
in the specific implementation of sexual reproduction strategies among different organisms? 
For example, in some organisms, sexual reproduction is merely used as a stress response. 
Many other organisms, insects for example, can either reproduce asexually (parthenogenesis) 
or sexually. Still other organisms reproduce almost exclusively sexually, but can reproduce 
asexually if there is no other option. In some organisms there is no sex differentiation, that 
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is, each individual is a hermaphrodite capable of producing both sperm and eggs. Other 
organisms have male/female differentiation, however in all female environments some of the 
females can transform into males. Furthermore, the males play widely varying roles in or- 
ganisms with male/female differentiation. In some organisms, the males compete intensively 
for females, so that only a small percentage of males ever succeed in mating, however those 
who do generally control a relatively large group of females. These males invest very little 
energy in the raising of their offspring. This may be contrasted with organisms where males 
take an active role in the raising of the offspring. In these circumstances, typically a male 
only mates with a single female, and a higher percentage of males are able to find female 
mates. 

Clearly, there must be different regimes where the various implementations of asexual 
and sexual reproduction strategies are respectively advantageous. A cost-benefit analysis 
that could identify these parameter regimes in a manner that is consistent with observa- 
tion is a central aspect of the overall question of the evolution and maintenance of sexual 
reproduction. 

It is therefore clear that the question of the evolution and maintenance of sexual repro- 
duction is in fact a complex issue that cannot be addressed in a single study. Rather, this 
issue can only be resolved within the context of a concerted research program that addresses 
a relatively broad array of questions. 

Nevertheless, research on the evolution and maintenance of sexual reproduction must first 
begin by understanding the basic advantage that this reproduction strategy provides. Once 
this basic advantage is understood, it is then possible to study why specific implementations 
of asexual and sexual reproduction strategies are observed in different regimes, and it is also 
possible to attempt to reconstruct the evolutionary pathways for the emergence of sexual 
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reproduction. 

As has been discussed in the preceding paragraphs, the various theories for the selective 
advantage for sexual reproduction all suffer from one or more deficiencies. As a result, even 
though much progress has been made in our understanding of the maintenance of sexual 
reproduction in many classes of organisms, the most fundamental question regarding the 
evolution and maintenance of sexual reproduction is still regarded as an open problem in 
evolutionary biology. 

Unicellular organisms are the ideal systems for studying the basic advantage for sexual 
reproduction over asexual reproduction. There are two reasons for this: First of all, because 
sexual reproduction already occurs in unicellular organisms, it makes sense to first study 
the selective advantage for sexual reproduction in these organisms, since their relative sim- 
plicity compared with multicellular organisms suggests that it will be possible to uncover 
the basic advantage for sexual reproduction without having to deal with additional compli- 
cations. Second, because unicellular organisms that are capable of reproducing sexually can 
also reproduce asexually, understanding the selective advantage for sexual reproduction in 
unicellular organisms will also help to delineate parameter regimes where asexual or sexual 
reproduction strategies are respectively advantageous. 

Saccharomyces cerevisiae, or Baker's yeast, is a model diploid unicellular organism that 
engages in a form of sexual reproduction when stressed. Thus, in this paper, we develop 
mathematical models describing asexual and sexual reproduction in unicellular organisms, 
where we take life cycles that are based on the asexual and sexual life cycles in S. cere- 
visiae (Herskowitz 1988; Mable and Otto 1998; De Massy et al. 1994; Roeder 1995). We 
assume multi-gene genomes comprised of semiconservatively replicating, double-stranded 
DNA molecules. While we still make a number of simplifying assumptions, we nevertheless 
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believe that the models considered in this paper are sufficiently realistic to be relevant for 
actual biological systems. Consequently, we believe that the results we obtain in this paper 
may be used to draw definite conclusions about the relative selective advantage of various 
reproduction strategies in unicellular organisms. 

We consider three distinct reproduction mechanisms: Asexual reproduction, self- 
fertilization, and sexual reproduction. Furthermore, for each reproduction mechanism we 
consider two extremes of genome organization, in order to explore the effect of recombina- 
tion on the selective advantage for the various reproduction strategies: A two-chromosomed, 
multi-gene genome, and a multi-chromosomed genome where each chromosome consists of 
a single gene. 

The mathematical models considered here assume that the only purpose of diploidy is 
to provide genetic redundancy, or more specifically, a mechanism to repair double-stranded 
genetic damage on one gene using the other, presumably undamaged, corresponding region 
in the homologous gene. This process is known as homologous recombination repair. As a 
result, we assume that all organisms whose genomes contain at least one functional copy of 
every gene have the wild- type fitness, taken to be 1. While it is possible that loss of function 
in one of the genes in a homologous pair can lead to a loss of fitness, if a cell has at least 
one functional copy of every gene in the genome, then it should remain viable. As a result, 
for a genome with a large number of genes, the fitness penalty for having an additional 
homologous pair with one non-functional copy of a gene should become steadily smaller as 
the number of homologous pairs with one non-functional copy of a gene increases. Thus, for 
the purposes of simplicity, we consider an initial fitness landscape where there is no fitness 
penalty for having homologous pairs with only one non-functional copy of a gene. In any 
event, it makes sense that the overall purpose of diploidy is to provide a mechanism for 
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repair and does not in general increase fitness. For if the latter was the case, then it is not 
clear why two should be some kind of "magic number" , in the sense that fitness is optimized 
when an organism has two functional copies of every gene. If fitness could be significantly 
increased by increasing the number of copies of a given gene, then it seems that the optimal 
number of copies of a gene should be highly gene-dependent (for example, highly expressed 
genes may be present in numerous copies, while one copy may suffice for genes that are only 
expressed from time to time). 

While the fitness of the organism remains the wild-type fitness of 1 as long as the genome 
has at least one functional copy of every gene, we assume that the fitness of the organism 
is reduced for every homologous gene pair that lacks a functional copy of a given gene. 
Thus, if / is the number of homologous gene pairs in the genome lacking a functional copy 
of the given gene, then the fitness of the organism is ki, where we assume that the ki are 
monotonically decreasing, so that Kq = 1 > Hi > k.2 > ■ ■ ■ > n^o = 0. 

Based on the analysis that follows, we obtain, in the limit of large N, that the mean 
fitnesses at mutation-selection balance for nearly all reproduction pathways is max{2e~'^ — 
1,0}, where N is the number of genes in the haploid set of the genome, e is the probabihty 
that a given template DNA strand of a given gene produces a mutant daughter as a result 
of replication, and /j, — Ne. The only exception is for the case of sexual reproduction in the 
multi-chromosomed genome. Here, the mean fitness can significantly exceed max{2e~'* — 
1,0}. 

Furthermore, except for sexual reproduction in the multi-chromosomed genome, all of the 
other reproduction strategies experience a total loss of viability once exceeds In 2. Here, 
the evolutionary dynamics of the population is characterized by the steady accumulation of 
deleterious mutations, and a steady decrease in fitness, eventually leading to a steady-state 
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mean fitness of 0. In the quasispecies model of evolutionary dynamics, this is known as 
the error catastrophe, which is characterized by a localization to delocalization transition of 
the population over the genome space (Eigen 1971; Tannenbaum and Shakhnovich 2005). 
Because the population fitness drops to zero in this case, the population also undergoes 
what is known as lethal mutagenesis. While the error catastrophe and lethal mutagenesis 
are formally distinct phenomena, they can often be associated with one another, as is the 
case with the models being considered here (Bull and Wilke 2008). 

However, for sexual reproduction in the multi-chromosomed genome, the error catastro- 
phe does not occur as long as > for each I. This result is interesting, for, although 
it is based on an analysis of unicellular organisms, it nevertheless suggests that sexual re- 
production is necessary to prevent genetic drift and population extinction in more complex 
organisms that have long genomes. For example, for S. cerevisiae, /i is on the order of 0.01, 
which is well below In 2 pa 0.69, while for humans {H. sapiens), fi is on the order of 3, which 
is considerably larger than In 2. Thus, S. cerevisiae may not need to reproduce sexually in 
order to remain viable (though sexual reproduction provides a selective advantage under 
stressful conditions) , but humans may simply die out if they were to reproduce asexually. 

It must be emphasized that this paper assumes a static fitness landscape, and assumes an 
infinite population, so that the selective advantage for sex does not arise due to a dynamic 
environment or a small population. Furthermore, in contrast to the Deterministic Mutation 
Hypothesis, we believe that our fitness landscape is a more "generic" one. In particular, 
synergistic epistasis is not necessary for sex to have a selective advantage. It is also not 
necessary for ii to be larger than 1 for sex to have an advantage. In fact, as long as each 
of the > for Z < oo, then sexual reproduction in the multi-chromosomed genome has a 
selective advantage over the other reproduction strategies for all values of /x. 
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Thus, in this paper, we have developed a model that yields a selective advantage for sex 
under fewer and far less restrictive assumptions than previous models. Interestingly, our 
model essentially does this by explicitly incorporating the role of diploidy, which is a level 
of realism that was not considered in many previous studies. 

II. DESCRIPTION OF THE ORGANISMAL GENOMES AND FITNESS LAND- 
SCAPES 

In this section, we describe the two modes of genome organization that we will consider 
in this paper. Figure 1 may be useful for what follows. 

A. Two-chromosomed genome 

We begin with the two-chromosomed genome. Here, we assume that a unicellular or- 
ganism has a diploid genome consisting of two chromosomes, where each chromosome has 
N genes, labelled 1, . . . , A^. We also assume that with each gene is associated a "master" 
sequence (actually a pair of complementary sequences, since we are dealing with double- 
stranded DNA), corresponding to a functional copy of the gene, while any mutation to the 
master sequence renders the gene non-functional. This is the multi-gene generalization of 
the single-fitness-peak approximation often made in quasispecies models of evolutionary dy- 
namics (Bull et al. 2005; Wilke 2005; Tannenbaum and Shakhnovich 2005). While this 
assumption is obviously oversimplified (indeed, recent research suggests that genes may, on 
average, sustain up to six mutations before losing functionahty (Zeldovich et al. 2007)), it 
is the simplest non-trivial landscape that allows for mutation and selection (as opposed to 
random genetic drift). Furthermore, the single-fitness-peak landscape reflects the fact that 
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only a small fraction of all gene sequences will encode a gene carrying out a specific func- 
tion, which is why the single-fitness-peak approximation has been known to provide correct 
order-of-magnitude estimates of various biological parameters (Kamp and Bornholdt 2002). 

We may denote a given chromosome by cr = S1S2 • • • sat, where each = 1 if gene i is 
functional, and Sj = if gene i is non-functional. This means that the genome of a given 
organism may be represented by {cri, (72}, where ai, 02 represent each of the two chromosomes 
in the genome. 

During replication, the two DNA strands of each chromosome separate, and each strand 
forms the template for the synthesis of a complementary daughter strand (Tannenbaum and 
Shakhnovich 2005). Because mutations can occur during each daughter strand synthesis, 
both daughter genes of a given parent gene may contain mutations. We let p denote the 
probability that a template strand from a master copy of a gene forms a mutation-free 
daughter, so that 1 — p is the probability that the template strand forms a mutated daugh- 
ter. If the template strand already has a mutation, then we assume that sequence lengths 
are sufficiently long that any new mutations occur in a previously unmutated portion of 
the strand, so that a mutated template strand forms a non-functional daughter gene with 
probability 1. This assumption is known as the neglect of backmutations (Tannenbaum and 
Shakhnovich 2005). Mutation gives rise to a transition probability p{a', a), which is defined 
as the probability that a given template strand from chromosome a' produces the daughter 
chromosome a. 

We also define e — 1 — p, and we define /i — Ne. /i is the average number of mutated 
genes produced from template gene strands per replication cycle. In what follows in this 
paper, we will consider the limit of A?" — > 00 with /i held constant, which is equivalent to 
holding the per genome replication fidelity constant in the limit of large genomes. 
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It should be noted that we are not necessarily assuming that the only source of mutations 
in the genome is due to point-mutations during replication. The model allows for mutations 
that accumulate in the genome in between replications, due to base modifications and dam- 
age that occurs as a result of free radicals, radiation, and spontaneous chemical alterations. 
During the growth phase of the cell, repair mechanisms are constantly at work repairing 
this genetic damage. However, these genetic repair mechanisms are not infinitely fast, and 
so cannot completely eliminate all genetic damage. As a result, at the time of replication, 
there will always be some bases that are damaged, which can then lead to the fixation of 
mutations in the daughter genome as a consequence of daughter strand synthesis. This leads 
to an effective per genome, per replication cycle point mutation rate that is somewhat larger 
than would be expected if one considered daughter strand synthesis errors alone. 

We let Tj denote the probability of mitotic recombination in this model (Mandegar and 
Otto 2007), which is the probability that the two daughter chromosomes of a given parent 
co-segregate into the identical daughter cell. Mitotic recombination generally refers to in- 
dividual genes. However, in this model, we assume that the genes on a given chromosome 
all co-segregate together, so that in this case refers to co-segregation of chromosomes. In 
the multi-chromosome model to be discussed below, individual genes may segregate inde- 
pendently of one another, so that n then more accurately reflects the biological definition 
of mitotic recombination. 

We assume that cells replicate with first-order growth kinetics. We let «;{(ti,(T2} denote the 
first-order growth rate constant of cells with genome {ai, (72}, and we let n^ai,a2} denote the 
number of organisms in the population with genome {cxi, (T2}. 

We define an ordered strand-pair representation of the population, by defining n(^ai,a2) — 
(l/2)n{^,,^2} if (Ti 7^ (72, and n(^,^) = We also define «;(ai,a2) = '^{<7i,a2}- 
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The ordered strand-pair representation leads to a method for characterizing a given or- 
dered strand-pair by three parameters, denoted /io,/oi,^oo- ^lo denotes the number of ho- 
mologous gene pairs for which the allele in ai is functional (i.e. a "1" gene) and the allele 
in (J2 is non-functional (i.e. a "0" gene), /qi denotes the number of homologous gene pairs 
for which the allele in ai is non-functional, and the allele in (72 is functional. Zqo denotes the 
number of homologous gene pairs where both alleles in ai and (72 are non-functional. We 
may also define In to be the number of homologous gene pairs where both alleles in ai and 
(72 are functional. Note that hi — N — Iiq — /qi — ^oo- Also note that, by definition of the 
fitness landscape given in the Introduction, we have that H{ai,a2) = '^loo- 

B. Multi-chromosomed genome 

For the multi-chromosomed genome, we assume a diploid genome consisting of homol- 
ogous gene-pairs, where each gene defines a separate chromosome, giving rise to a genome 
consisting of 2N genes. We assume that the homologous pairs segregate independently of 
one another, though for each homologous pair we may assume a mitotic recombination prob- 
ability Tj, defined as in the previous subsection. Indeed, unless otherwise specified, all of the 
definitions in the multi-gene, two-chromosome model are the same for the multi-chromosome 
model being considered here. 

Because the genes all lie on separate chromosomes, a diploid genome may be characterized 
by the two parameters Zio, loo, as opposed to the three parameters ho, Iqi, loo as in the previous 
subsection. Here, a diploid genome characterized by the parameters ho, ^oo has exactly ho 
homologous pairs with one functional gene and one non-functional gene (i.e. a "1" and a "0"), 
and Iqo homologous pairs with two non- functional genes. As before, we have hi = N—Iiq—Iqq. 

Although both the two-chromosomed and multi-chromosomed genomes represent ex- 
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tremes of genome organization, we argue that, due to the Law of Independent Assortment 
of Alleles in classical genetics, the dynamics arising from the multi-chromosomed genome 
more closely approximates the true segregation dynamics of genes in actual organisms. 

III. ASEXUAL REPRODUCTION 

A. Description of the reproduction pathway 

In the asexual reproduction pathway, each chromosome replicates, and then the daughter 
chromosomes segregate into one of the two daughter cells. Each daughter cell receives two of 
the daughter chromosomes from a given homologous pair, and it is assumed that daughter 
chromosomes from distinct homologous pairs segregate independently of one another. 

If there is no mitotic recombination, then the two daughters of a given parent segregate 
into distinct daughter cells. With mitotic recombination, the two daughter chromosomes 
(or genes, in the case of the multi-chromosomed genome) of a given parent chromosome 
co-segregate into the same daughter cell. As mentioned previously, mitotic recombination 
for each homologous pair occurs with probability r^. 

Figure 2 illustrates the asexual reproduction pathway. 

B. Two-chromosomed genome 

1. Evolutionary dynamics equations 

In Appendix A.l, we show that the evolutionary dynamics of a population of asexually 
reproducing organisms with two-chromosomed genomes is given by. 
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Here, zi^^i^^^i^ defines the total fraction of tlie ordered strand-pair population characterized 
by the parameters Zio = h, loi — h, ho — h, and R{t) is the average first-order growth rate 
constant of the entire population, a quantity known as the mean fitness. We have that 

^(0 — ^h=o ^i2=o ^i3=o ^ i^h^hMM- 

2. Mean fitness at mutation- selection balance 

For all of the reproduction strategies being considered in this paper, the central object of 
interest is the mean fitness of the population at mutation-selection balance (or equivalently, 
at steady-state). The reason for this is that the mean fitness, by measuring the first-order 
growth rate constant of the population as a whole, determines which population will drive 
the other to extinction when two or more populations are mixed together. Due to the nature 
of exponential growth, the population with the largest mean fitness will drive the others to 
extinction, which means that the reproduction strategy that the winning population employs 
is the reproduction strategy that has the selective advantage over the others for the given 

17 



set of parameters. 

This approach to determining which reproduction strategy is optimal for a given set of 
parameters is known as the group selection approach. The group selection approach may be 
criticized in that it does not take into account the fact that selection acts on individuals, 
rather than populations. An individual organism whose genes code for an optimal survival 
strategy in the given environment will out-reproduce the other organisms in the population. 
This survival strategy may not necessarily coincide with the optimal survival strategy for 
the population as a whole. Indeed, it is well-known that the group selection approach is 
inadequate for taking into account effects such as co-evolutionary dynamics, parasitism, and 
defection from cooperative strategies. 

Despite the deficiencies of the group selection approach in general, it can give correct 
results under certain circumstances. In cases where different populations or individuals do 
not directly interact with one another, so that one organism does not increase its fitness at 
the expense of the other, the group selection approach is a valid method for determining 
which genes will be selected for in a given environment. 

In this paper, we make the simplifying assumption that populations with distinct re- 
production strategies do not mix with one another (that is, sexuals interact with sexuals, 
asexual with asexuals, etc.), so that in our case the group selection approach is valid. The 
group selection approach, however, would be problematic if we wished to consider not the 
maintenance of sexual reproduction, but rather the evolution and emergence of sexual re- 
production from an asexual population. Indeed, in recent work we found that pure sexual 
replicators could not arise from an asexual population, because their initial population den- 
sity would be so low as to lead to large mating times that would completely eliminate any 
benefit for sex (Tannenbaum and Fontanari 2008). 
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At mutation-selection balance, the mean fitness is given by, 



K — max{Ki[2{l — e) 



N-l 



l]\l^O,...,N} 



(2) 



It must be emphasized that this result is the exact finite solution for the steady-state 
mean fitness, and does not depend on the value of rj. 
In the limit as iV — > oo with /i held constant, we have. 



where this result is both independent of and the specific nature of the fitness function 
{ki} (assuming that the fitness function satisfies the monotonicity condition given in the 
Introduction) . 

The transition between the two functional forms for k at /i. = ln2 corresponds to a 
localization to delocalization transition known as the error catastrophe. Beyond this value 
of ji, the mutation rate is sufficiently high that natural selection can no longer localize the 
population to a given region of the genome space, and the result is the loss of viability due 
to genetic drift. If we include decay terms into our model (e.g. death or loss of organisms 
due to fiow out of a chemostat) , then this loss of viabihty can lead to the extinction of the 
population (a phenomenon known as lethal mutagenesis). 

To avoid encumbering the biologically relevant results of our model (i.e. the steady-state 
mean fitness) with the detailed mathematical derivations, we have placed the mathematical 
derivations in the following subsubsection. We believe that the mathematical analysis is 
sufficiently interesting that it should not be relegated to an Appendix. However, we place 
it in a separate section from the main results so that the reader can choose to simply skip 
over the mathematical details. 



R max{2e-'' -1,0} 



(3) 
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3. Mathematical derivation of the mean fitness at mutation- selection balance 

To determine the mean fitness at mutation-selection balance, denoted by we proceed as 
follows: We define a generating function (Wilf 2006) wi{Pi, P2,t), defined over the population 
distribution {zi^,i2,i3}, via, 

N-l N-l-h 

wi{f3uM = J2 E f^i^2Ziui2,i (4) 

il=0 l2=0 

and we also let wi{Pi, P2) denote the steady-state value of wi{Pi, P2: t)- 

In Appendix D we show that, at mutation-selection balance, the following equation holds 

for = p,P2 = l- P: 

dwiif3,l-f3,t) ^ _ ^^N-i^^,^^^^^ t) + {l- r,)wi{(3, 1 - (3, t)) - wi{P, 1 - (3, t)] 

at 

-R{t)wi{(3,l-P,t) (5) 



where equality holds if / = 0, or if zi^^^i^^i^ = for /s < /. Setting /9 = 1 we obtain, 

>[Ki{2{l-ef-'-l)-K{t)]wi{l,Q,t) 



dwi{l,0,t) ^_ 



dt 

(6) 

and so, if we assume that the system converges to a stable steady-state, then we must have 
that K > ni[2{l - e)^-^ - 1] for all / = 0, . . . , A^, and so R > max{«;/[2(l - e)^"' - 1]|/ = 
0,...,A^}. 

Let /* denote the smallest value of /s such that there exist for which zi^^i^^i^ > at 
steady-state. Because the zi^^i^^i^ sum to 1, it follows that some of them must be positive, 
and hence such an /* must exist. 

We have that wi*{l/2, 1/2) > 0. If we also have that 0) > 0, then, 

= [ki. (2(1 - e)^-'* - 1) - k]wi* (1, 0) (7) 
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which imphes that k — ki*[2{1 — e) 



N-l* 



1]- 



If, on the other hand, we have that wi*{l, 0) = 0, then we obtain. 



0-[/^,.(2(l-r,)(l-e) 



N-l* 



1 1 

1) -'«]'f^i-(2' 2^ 



(8) 



which imphes that R — k,i*[2{1 — ri){l — e)^~'* — 1]. 

If = then the two expressions for R are identical. If > 0, however, then the second 
expression is smaller than the first, which is impossible, given the inequality that R must 
satisfy. Therefore, for > 0, we must have that wi* (1,0) > and so in any case we have 
R = Ki*[2{l - e)^-'* - 1]. However, given that R > max{K,[2(l - e)^-' - = 0, . . . , A^}, 
we must have that R = ki,[2{1 - e)^"'* - 1] = max{Ki[2{l - e)^"' - = 0, . . . , N}. 

Now, let us consider the limit as N ^ oo while holding /j, fixed, and let us consider two 
different regimes, the first where 2e~'^ — 1 > 0, and the second where 2e~'^ — 1 < 0. The 
first regime corresponds to the interval < < ln2, while the second corresponds to the 
interval // > ln2. 

Given that the ni are monotonically decreasing, and given that lim^^oo = 0, it follows 
that, given any e' > 0, there exists some 1^' > such that ni < e' whenever I > 1^'. We may 
relax this condition somewhat, in order to allow for the possibility that finite genome sizes 
affect the fitness landscape, but that the fitness landscape nevertheless converges as A?" — > oo 
to a landscape that satisfies the property given above. 

Thus, we assume that the fitness landscape has the following property: For every e' > 0, 
there exists an l^i > and an N^i > such that ki < e' whenever / > l^i and > N^/. 

So, suppose that n e [0,ln2), so that 2e~^ — 1 > 0. Then let us assume that Z, are 
sufficiently large so that ki' < 2e~'^ — 1 for all I' > I. Then, given e' > 0, choose N^/ to be 
such that 1(1 - e)" - e-'^l < e' for all n>N^,. Then, for I' < I we have, for N > + I, 
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that, 

«,[2(1 - e)^-'' - 1] < Ki,[2ie-^ + e') - 1] 

= Ki> {26-" - 1) + 2ki'€' < 2e-'' - 1 + 2e' (9) 
Now, for I' > I we have that, 

Kv[2{l - e)^-'' - 1] < Ki' < 2e-^ - 1 < 26'^ - 1 + 2e' (10) 
and so we have that R < 2e~^ — 1 + 2e'. However, we also have, for N > N^/ + 1, that, 
K>2{l-e)^ -1> 2{e-^ - e') - 1 = 26'" - 1 - 2e' 

(11) 

and so we have that 2e~^ — 1 — 2e' < R < 2e~^ — 1 + 2e' . Since e' > is arbitrary, it follows 
that, for jj. e [0, In 2), we have that R — > 2e~^ — 1 as A?" — > oo. 

Now suppose that e [In 2, oo), so that 2e~^ — 1 < 0. Then given some e' > 0, choose 
I, N to be sufficiently large so that k;/ < e' for all V > I. Then, choose N^i to be such that 
1(1 - e)" - e-^l < e'/2 for all n > TV,/. Then, for I' < I we have, for N > N,, + I, that. 



^,[2{l-ef-' -l]<K,[2{e-^ +'-)-!] 

^ ni,{2e-'' -1) + Ki,e' <e' (12) 
while for V >l we have that, 

Ki,[2{l-ef-'' -l\<K,<e' (13) 

and so we have that R < e' . However, we also have that k > 0, so since e' > is arbitrary, 
it follows that, for ^ e [In 2, oo), k ^ as ^ oo. 

The result of our analysis is that R — max{2e~'* — 1, 0} in the — > oo hmit. 
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When Ti — 0, we may prove that zi^^i^^i^ = at steady-state whenever Zi -1-/2 + ^3 < N. We 
will prove this by contradiction. So, suppose that there exist h, I2, h where h + I2 + h < N 
such that zi^^i^^i^ > 0. Then let us choose to be the smallest value of ^3 for which there 
exist /i,/2 with li + I2 + I3 < N and zi^^i^^i^ > 0. This means that, whenever /a < /g, then 
Zlul2,h >0^h + l2 + h^N. 

Now, given l^, choose Z^, I2 so that II+I2 is the smallest value of h+h for which zi^^i^^i* > 0. 
Then, in Eq. (1), setting = l\, h = 1*2, h = II, we have that I'^ + l'^ + l'^ + l'^ + l'^ < /J + /2 + /I < 
A^, so by definition of l^, we must have = for l'^ < l^. Therefore, in Eq. (1), 

we need only consider l'^ = l^, which implies that l'^ = I4 = 0, and so 2;;'^+;^^;^+;^^;^ = zi'^^i'^^i*. 
Furthermore, because l[ < II, I2 < I2: '^^ have l[ + I2 < + I2, with equality if and only 
if l[ = II, I2 = l2- By definition of II, I2, it follows that z^^/^^i* = unless l[ = II, 1'2 = l2- 
Putting everything together, we obtain that, at steady-state, Eq. (1) becomes, for rj = 0, 

= [«:,*(2(1 - e)^-'3(l - e)^-'^-'^-'3 - 1) - ^]^,.,,.,,. (14) 

which implies that R = fi;;.(2(l-e)^~^3(l-e)^-'i-'2-'3-l). However, because /j[-F/2+^3 < N, 
it follows that (1 - e)-^~'i"'2~'3 < 1 =^ ^ < ki*{2{1 - e)^~'3 - 1) from the result for R. 

With this contradiction, our claim is proved. 

C. Multi-chromosomed genome 

1. Evolutionary dynamics equations 

In Appendix A. 2, we show that the evolutionary dynamics of a population of asexually 
reproducing organisms with multi-chromosomed genomes is given by. 
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^'^''^ = -{Ki, + K{t))zi,,i, +2 ^ E E E X 



dt 



(l,-l'^)\{l,-l'^-l-y.{N-h-h-l'y^ ''^ ''^ 

(15) 

where zi-^^i^ is the total fraction of the population whose genomes are characterized by the pa- 
rameters lio — li, Iqo — I2, and the mean fitness K{t) is given by R{t) — X^^=o SzI=o^ i^h^h,i2- 



2. Mean fitness at mutation- selection balance 

As with the two-chromosomed genome, the mean fitness for the multi-chromosomed 
genome at mutation-selection balance is given by, 

K = max{«;i[2(l - e)^"' - 1]|Z = 0, . . . , A^} (16) 

where this result is independent of the value of rj. 

In the limit as — > 00 with ^ held fixed, we obtain that, 

K ^ max{2e-^ - 1, 0} (17) 

3. Mathematical derivation of the mean fitness at mutation- selection balance 

To determine the mean fitness at mutation-selection balance, we proceed analogously 
to the two-chromosomed case: We define a generating function wi{P,t), defined over the 
population distribution {zi^^i^}, via, 

N-l 

wi{(3,t) = J2f^'^k,i (18) 

fe=0 
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and we also let wi{P) denote the steady-state value of wi{P, t). 

Following a similar procedure to the derivation in Appendix D, we may show that, 



> + R{t))w,{P, t) + 2(1 - ef-^il + (2/3 - l)^]^-'^.^^ l + ^'^-l)^^ ^ 

(19) 

with equality if Z = or if zi^^i^ — ioi I2 < I- 
Setting P — 1/2 gives, 

^^^^ > [«K2(1 - ef-^ - 1) - R{t)]w,{l, t) (20) 

with equality if Z = or if zi-^^^i^ = for I2 < I. As with the two-chromosomed model, this 
implies that R > max{«;,[2(l - e)^"' - 1]|Z = 0, . . . , A^}. 

Let /* denote the smallest value of I2 such that there exists an li for which zi-^^^i^ > at 
steady-state. Then since zi^^^i^ — for I2 < I*, we have, at steady-state, that, 

0=h,(2(l-e)^-'*-l)-KK(i) (21) 

Because there exists an li for which zi^^^i* > 0, it follows that ■u;i*(l/2) > 0, and so 
R = Ki*[2(l - e)^-'* - 1] =^ K = max{«;i[2(l - e)^-' - 1]|Z = 0, . . . , A^}. 

As is the case for the two-chromosomed model, it follows that R. — > max{2e~'' — 1, 0} as 
A^ ^ 00. 

For Ti = 0, suppose that there exist /i, I2 with li + 12 < N such that zi^^^i^ > at steady- 
state. Then let I2 be the smallest value of I2 for which there exists an h with li + 12 < N 
such that zij^^i^ > at steady-state. Then let be the smallest value of li such that zi^^i* > 
at steady-state. 

In Eq. (15), when = we have that l[ — 0. We also have, for li — II, I2 — I2: that 
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^2 + ^3 + ^4 ^ ^1 + ^2 < SO zi'^+i'^^i'^ — for ^4 < I2, and so we may take Z4 = I2, l'^ — 0. 

Now, by definition of II, we have that ziy* = whenever /2 < ^1, and so we may take I2 = 11- 
At steady-state, Eq. (15) then becomes, 

= [«:,.(2(1 - ef-'Hl - e)^-'^-'^* - 1) - ^]^,*,,* (22) 

andsoK = fi;,|(2(l-e)^-'2(i-e)^'-'t-'2-i). Since /*+/* < iV, it follows that (l-e)^-'^-'^ < 
1^R< Kq{2{l - e)^-'2 - 1) =^<S=, thereby proving our claim. 

IV. SELF-FERTILIZATION 

A. Description of the reproduction pathway 

In the self-fertilization reproduction pathway, a diploid cell first divides via the asexual 
pathway into two diploid daughter cells. Each of the diploid daughter cells then divide into 
two haploids, where each haploid receives exactly one chromosome from each homologous 
pair. The result is four haploids, which then pair at random with one another and fuse to 
form two diploid cells. 

This pathway is illustrated in Figure 3 for a two-chromosomed genome. As with the 
case for asexual reproduction, the multi-chromosomed case is similar, except that distinct 
homologous pairs segregate independently of one another. 

B. Two-chromosomed genome 

For the two-chromosomed genome, the equations for self-fertilization are identical to the 
equations for asexual replication, where rj = 1/3. The reason for this is that a given parent 
diploid cell produces four haploids containing four chromosomes. Because mating is random, 
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a given chromosome has a probabihty of 1/3 of pairing with any other chromosome, which 
gives Ti = 1/3. 



C. Multi-chromosomed genome 

1. Evolutionary dynamics equations 

In Appendix B, we show that the evolutionary dynamics of a population of organisms 
reproducing via the self-fertilization pathway are, for the multi-chromosome case, given by. 



.^N-h-h h h h-l'z _|_ /' _|_ /Ml 



dt V ■ V ^/ '1.'^ '3 ^ '4 'l-'2-'3.'4 l>\l'\l'\ 

[(|(1 - ef)\{\ - 6)(1 - r,(l - + |(1 - ef)''^ 

+2(^(1 - 6)^)'i((l - 6)(1 - i^(l - e)))\e + i^(l - efys] x 

(^-^1-^2-^3-^4)! r2 /I _ A]h-l'2(.2Y2-l'3-lU(l _ A^]N-h-h-l[ 



(23) 



2. Mean fitness at mutation-selection balance 

The mean fitness for the multi-chromosomed genome with the self-fertilization pathway 
at mutation-selection balance is given by, 

R = max{fi;z[2(l - e)^"' - 1]|/ = 0, . . . , A^} (24) 

where this result is independent of the value of r^. 

In the limit as — > 00 with /i held fixed, we obtain that, 

K max{2e-'' - 1, 0} (25) 
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3. Mean fitness at mutation-selection balance in the limit where iV — > oo 

Defining wi{P,t) as for the case of asexual reproduction in the multi-chromosomed 
genome, we obtain, 



Following a similar analysis to the one performed for the asexual, multi-chromosomed 
case, we obtain that R — max{«;i[2(l — e)-^"' — 1]|Z = 0, . . . , N}, and that R — > max{2e~"^ — 
1,0} in the limit where N ^ oo. 

V. SEXUAL REPRODUCTION 

A. Description of the reproduction pathway 

In the sexual reproduction pathway, we assume that a diploid cell produces four haploids 
in the same manner as for the self-fertilization pathway. However, instead of the four haploids 
fusing with one another, the haploids enter a haploid pool, where they fuse at random with 
haploids produced by other diploid parent cells. This reproduction pathway is illustrated in 
Figure 4. 

In contrast to self-fertilization, where we assume that the haploid fusion is fast (since the 




(26) 



with equality if / = or if zi^^i^ = ior I2 < I. 



Setting P — 1/2 gives. 



dt 



>[Ki{2{l-ef-^-l)-R{t)]wt{-,t) 



(27) 
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haploids are in close proximity to one another, having been produced by the same parent), 
with sexual reproduction we must take into consideration the haploid population. 

A given haploid genome, whether it is derived from the two-chromosomed or multi- 
chromosomed diploid genome, may be characterized by the parameter Iq, which is the number 
of non-functional genes in the cell. We may then let riig denote the number of haploids in the 
population whose genomes are characterized by the parameter Iq. Now, because a diploid 
cell contains twice the number of chromosomes as the corresponding haploid, we define the 
total population n to be no + nn/'^, where no is the total population of diploids, and uh 
is the total population of haploids. We then define the haploid population fractions zi via 
zi — {l/2)ni/n. We define the total haploid population fraction zh — ^iLo^i — (l/2)nij/n. 

We assume that haploid fusion is a second-order process characterized by a second-order 
rate constant 7. If V denotes the system volume, then we assume that, as the population 
grows, the volume increases so as to maintain a constant population density p = n/V. 

B. Two-chromosomed genome 

1. Evolutionary dynamics equations 

In Appendix C.l, we show that the evolutionary dynamics of a population of sexually 
reproducing organisms with two-chromosomed genomes is given by. 
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= -(^.3 + ^{t))z,,,,,, + 27P X 

N -k-k-k + k ^ 1 
^11 N-h-k + k ^^11 N-h + k>'^'^''''^^'' 

k=l k=l 

, N-l I l-h l-h-h /, , 7 Nl 

(/-/2-^3-/4)!(A^-^-^i)! ^ ^ ^ ^ 

In this paper, we will consider for simplicity the limit as 7p — > 00, so that the char- 
acteristic haploid fusion time is negligible. At this stage, we are neglecting the time cost 
for sex associated with the characteristic haploid fusion time, in order to see if we can first 
identify a basic advantage for sex before considering costs that can reduce or eliminate this 
advantage. 

In the 7p — > 00 limit, we have that the evolutionary dynamics equations are given by. 



where. 



k=l ^ k=l 



N-l I l-h l-h-h /, , 7 \| 



Il=0l2=0l3=0 14=0 ^ 



(29) 
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2. Mean fitness at mutation- selection balance in the limit where iV — > oo 

The generating function approach that was successfully used to obtain the mean fitnesses 
of the non-sexual reproduction pathways does not work for the sexual reproduction pathway. 
Nevertheless, in the N ^ oo limit, it is possible to derive an analytical expression for the 
mean fitness of the two-chromosomed, sexual reproduction pathway, at mutation-selection 
balance. Interestingly, in the limit as iV — > oo at fixed /x, we obtain that, 

AC = max{2e-'' - 1, 0} (31) 

which is identical to the — > cxo limit of the other reproduction strategies. 

Figure 5 shows a plot of R versus fj, ior N — 50. We assume a multiplicative fitness 
landscape, defined by ni = a.\ with a = 0.8. We present plots of R using both the analyti- 
cal, — > oo result, and results obtained by solving for the steady-state of the evolutionary 
dynamics equations using fixed-point iteration. Note the good agreement between the an- 
alytical result and the results obtained by fixed-point iteration. Due to finite size effects, 
the numerically computed values of R near /i — ln2 are slightly larger than the analytical, 
N ^ oo result. 

3. Mathematical derivation of the mean fitness at mutation-selection balance in the limit where 
N ^oo 

In the limit as 7p — > oo, we obtain that zi ^ ior I — 0, N , so that R{t)zi — > 0. 
However, it is possible that jpznZi converges to some finite and possibly non-zero value. 
Assuming a steady-state for the haploid population (because the zi — 0) we obtain. 
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N-l I l-h l-h-h ^ +1 )\ 

IpZHZi = ^^^ i^hZi^+i2,h,h \ 1^ ,^ " (1 - X 



(1 - e)''-'-'i (32) 



{l-l2-l3-kV.{N-l-h)\ 

Summing / from to N gives ■jpzjj = K.{t). Therefore, defining zi = zi/zh, we may solve 
for zi in terms of K{t) and the diploid population fractions. Substituting the results into the 
dynamical equations for the diploid population, we obtain Eq. (29). 

In the limit as A?" — > oo, one possible solution is simply that the population is completely 
delocalized over the sequence space, and so R = 0. So, we first consider the regime where 
K > 0. In Appendix E.l, we show, in the limit as N ^ oo, that. 



{h + kV.ik + hV. A N-h-k-k + k ^ 1 ^ irW2w3 .... 
ijyhi N-h-h + k >^l\N-h + k^ ^ y.^W' ' ^^^^ 

So, at mutation-selection balance where k > 0, we have that, 

2 1 Jihsu -hh. J. J. 

(34) 

Noting that zi — fi/R, we may substitute the expression for zi^^i^^i^ into the definition of 
the fi to obtain, in the limit of large N , that. 



fc=0 l4.=0 '"^ h=0 

Now, let /* denote the smallest value of / for which zi > 0. Since zi = for all / < /*, we 



have. 



— Ei^^E^(^)'--^4.. (36) 

^4=0 tl=0 
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In Appendix E.l, we show that the distribution for the zi approaches a Gaussian with 
mean that scales as \/N and a standard-deviation that scales as A^^/^. If we then define a 
probability density function p{x) via \/Nzi — p{l/y/N), then in the limit of large N we may 
write, 

/■OO °° K 1 

1 = 2e-^ / dxie-^Xxi) V {xx,y^ (37) 

Jo f^o'^ + i^kh^- 

where x = l*/y/N. Now, using the inequahty, 

< (38) 



we have. 



K + Ki^ K+1 



l<_^ (39) 



and so K < 2e — 1. 



Now, if we define wioo = Y1!i=q ^i,o,o, then it is possible to show, for finite A^, that, 

^ = [2(1 -ef-l- R{t)]w,,o (40) 

from which it follows that k>2(1 — e)-^ — lin order for the steady-state to be stable. In 
particular, as A^ — > oo, we obtain that R > 2e~^ — 1. Combined with the previous analysis 
giving that R < 2e~'^ — 1, we obtain that R — 2e~'^ — 1. However, since k > 0, we have that 
R = max{2e^^ - 1,0}. 

C. Multi-chromosomed genome 

1. Evolutionary dynamics equations 

In Appendix C.2, we show that the evolutionary dynamics of a population of sexually 
reproducing organisms with multi-chromosomed genomes is given by, 
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A;=l fe=l 

{N-h-k-hV- 1-1,-1, .-^ _ .)N-i-h (4^1) 

Following a similar procedure to the two-chromosomed case, we obtain, in the limit as 
7P — > 00, that. 



dzia. , , -u^^ , 2 (I + - 1 + h)\ 



I TKT 1 1,1 h 



k=l k=l 



where. 



^1=0/2=0/3=0 



(42) 



(1 - e)^-'-'i (43) 



(/-/2-/3)!(iV-^-/i)! 
2. Mean fitness at mutation-selection balance in the limit where iV — ^ 00 

Prom Appendix E.2, we have that, whenever R > 0, then it is obtained by solving the 
pair of equations, 

~ 1 A^'k, 



U K + Ki 

1 X'^^Ki+i 



1=0 

00 
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When Ki — 5io, where 5ij denotes the Kronecker delta function, we have, from the second 
equation, that = /i. Substituting into the first equation, we obtain, 

1 = =^ K = 26-" - 1 (45) 

R + 1 ^ ^ 

and so R — max{2e~'* — 1,0}. 

However, if > for finite I, then we find that the steady-state mean fitness for the 
sexual reproduction pathway for the multi-chromosomed genome exceeds the mean fitness 
of max{2e^'* — 1,0} for the other pathways. Admittedly, we have only checked this for 
multiplicative fitness landscapes for which ni = a\ where a G (0, 1). However, we conjecture 
that this result will hold more generally, since the multiplicative fitness landscape seems to 
be a reasonable first approximation for how ki will vary with /. Essentially, what we are doing 
with the multiplicative landscape is averaging over the various fitness penalties induced by 
knocking out a given homologous pair in the genome. To be more precise, we are making an 
optimal curve fit of the form a' to the fitness values kq — 1, ki, K2, ■ ■ ■ , k^o — 0. This can be 
done by taking the natural logarithm of the fitness functions, and finding the optimal linear 
fit Una for the points In kq — 0, In ki, In ^2, . . . , In k^o — —00. 

The fitness values {«;} are themselves averages of the true fitness landscape of the or- 
ganism: For a given value of ki is taken to be the average of all fitnesses obtained from 
all possible genomes having I homologous pairs lacking a functional copy of their respective 
genes. 

The fitness increase of the multi-chromosomed sexual pathway over the other pathways 
becomes larger as a increases from to 1. Crucially, the multi-chromosomed sexual pathway 
does not appear to exhibit any kind of change in the functional form of R at some critical 
/I, signaling the onset of an error threshold. Thus, it appears that the multi-chromosomed 
sexual reproduction pathway considered in this paper does not have an error threshold, so 
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that a sexual population can survive at mutation rates where a non-sexual population would 
lose viability and presumably go extinct. 

Figure 6 shows a plot of R versus /i for N — 50, assuming a multiplicative landscape 
with a = 0.8. We present plots obtained by numerically solving for R using the N —>■ 
oo equations given in Eq. (44) (using a combination of fixed-point iteration and binary 
search) , by numerically solving the evolutionary dynamics equations themselves using fixed- 
point iteration, and by stochastic simulations of finite populations of reproducing organisms. 
For comparison, we also include a plot of the function max{2e~'^ — 1,0}. Note the good 
agreement that is obtained between the stochastic simulations, the fixed-point iteration of 
the evolutionary dynamics equations, and the numerical solution of the N ^ oo equations. 

We can obtain an analytical, closed form expression for R in the limit that o; — > 1. We find 
that lima_^i R — e~^'*, a result that will be derived in the following subsubsection. Because 
g-2M > ^j^is result is consistent with our claim that there is no error threshold for sexual 
reproduction with the multi-chromosomed genome. Furthermore, because e~^^ > 2e~^ — 1, 
with equality only occurring for = 0, we obtain that this result is also consistent with our 
observations that the sexual, multi-chromosomed mean fitness exceeds the mean fitness of 
the other reproduction pathways as long as a > 0. 

Figure 7 shows a plot of R versus // for a multiplicative landscape with a — 0.99. Because 
a is so close to 1 here, we were unable to show results from either fixed-point iteration of 
the evolutionary dynamics equations themselves, nor results from stochastic simulations, 
since the required value of N in both cases, and the required population size in the latter 
case, would be so large as to make computation times prohibitive. However, we may readily 
obtain numerical expressions for R by solving the N ^ oo equations given by Eq. (44), and 
comparing the result with the analytical expression of e~^^. As can be seen in Figure 7, the 
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results are indistinguishable. 



3. Mathematical derivation of the mean fitness at mutation-selection balance in the limit as 

N ^ oo 

Following a similar procedure for the two-chromosomed case, we have, in the limit of 
large N, that the steady-state distribution zi-^^i^ satisfies, 

where this analysis of course assumes that R > 0. 

Substituting into the definition for fi, and following a similar procedure as was done for 
the two-chromosomed genome, we have, in the large N limit. 



^'-^^ 2^^ + ^ 2^ k\ ^ h\{i-k-ky. ^2' 

l3=0 '3 A:=0 Zi=0 ^ ^ ^ 

E''"^' 1 J4{h + i - h - k - k).u u(h+i-h-k-u) ^ , , 

-jji— -Y'e ^ zi,+i,zi,+i-k-h (47) 

l4=0 ^' 

As N becomes large, we have observed that the zi approach a Gaussian distribution with 
a mean that scales as y/N and a standard deviation that scales as N^/^. As a result, if we 
define a variable x — 1/ \fN , then in the limit as N ^ oo we can transform from a discrete 
representation in terms of the zi into a continuous representation in terms of a probability 

density p{x), where conservation of probability implies that p{x){l/\/N) = zi ^ p{x) = 

The transformation from a discrete to a continuous representation allows us to re-write 
Eq. (47) as an integral equation. We then take the Laplace transform of both sides of the 
equation. Since we are dealing with the large limit, we expand the Laplace transforms 
on both sides of the equation out to 1/y/N and equate the two first-order expansions. This 
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leads to a set of equalities that must hold in the hmit of large N, which gives us the pair of 
equations in Eq. (44) that must be solved in order to obtain R. The details of this derivation 
may be found in Appendix E.2. 

Now, let us analyze the behavior of Eq. (44) in the limit as a — > 1. In this limit, we expect 
that — > oo. The reason for this is as follows: In Appendix E.2, we define A to be such 
that X\/N is the average number of defective genes in a given haploid. We also have that, 
as A?" ^ oo, the zi converge to a Gaussian distribution that in fact approaches a 5-function 
centered at XVn. In Appendix E.2, we show that the probabihty that two haploids, each 
having X\/N defective genes, will fuse to form a diploid with exactly I homologous gene 
pairs lacking a functional copy of the given gene is given by, 

^A^^e-^' (48) 

Therefore, on average, the overlap of two haploids at mutation-selection balance will 
produce a diploid with a fitness of 

J^i.(AVe-''=e-^'M (49) 

1=0 

In order for the steady-state distribution to be localized, we expect, for a given // > 0, 
that this quantity is below some value that is less than the wild- type fitness of 1. Otherwise, 
haploid overlap will not lead to the purging of deleterious mutations from the population, 
and thereby counter the mutation-accumulation induced by /x. Indeed, the larger the value 
of II, the greater the rate of mutation-accumulation, and so we expect that e^'^^^^"") should 
consequently decrease to purge deleterious mutations sufficiently effectively. 

Thus, as a — > 1, we expect A^ — > oo in order to keep the mean fitness of the diploids pro- 
duced from haploid fusion sufficiently small to counter mutation-accumulation and thereby 
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localize the population. However, as — > oo, then the Poisson distribution approaches a 
Gaussian distribution with a mean of and a standard deviation of A. We may therefore 
write, in the limit as o; — > 1, that. 

For the multiplicative fitness landscape where a ^ 1, Eq. (44) then becomes. 



oo 



1 = 2 V exp 



2\2 



a' 1 . (Z-A^) 



« + a' Aa/2^ 2A2 



oo 



Now, let us define a continuous variable x via x — l/)?. Then we have, 

1 A A^(x-l)^ 

x2/. 1 aia^y A , A2(x-1)^, 

li = A^(l - 2 > — ^ , — = exp ^^ — ^ ) 

A2 At + a ^ V2^ 2 



(51) 



(52) 



Defining a = 1 — s, it should be noted that, 

lim e-^'(^-") = lim e^'^"^) = lim e^' = lim(l - s)^' = lim a^' (53) 

a— *1 s— »0 s— »0 s— »0 a— »1 

and so, when a is close to 1, the mean fitness of the diploids produced by the haploid fusion 
becomes a^^ . For a given we expect this to converge to a given quantity in order to allow 
for the localization of the population at steady-state. 

As A ^ oo, we have that (A/v^tt) exp[— A^(x — 1)^/2] S{x — 1). So, as a ^ 1, the 
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above pair of equations may be written as, 



1^2/ —-—^r^d{x - l)dx = — — ^ 







The first equation gives us that R — a^^ . Substituting into the second equation, we 
obtain. 



A' = ^- (55) 

1 — a 



and so, 

R = af^^ = e'^(^+")^ (56) 



This expression is only vahd for a close 1. Again defining a = 1 — s, we then obtain. 



hm R = e^'^""'-° ^ = e-^" (57) 

a— »1 



VI. DISCUSSION 

A. The basic mechanism for the selective advantage of sexual reproduction 

The basic mechanism explaining the selective advantage of sexual reproduction over asex- 
ual reproduction and self-fertilization is as follows: If a diploid cell has a homologous pair 
where both genes are non-functional, then, if this cell reproduces either asexually or via 
the self-fertilization pathway, the daughter cells will also have two non-functional genes in 
this homologous pair. The reason for this is that a homologous pair with two non-functional 
genes will produce four non-functional daughter genes. If these four genes are the only genes 
that can produce the corresponding homologous pairs in the daughter cells, as is the case 
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with asexual reproduction and self-fertilization, then the corresponding homologous pairs in 
the daughter cells will have two non-functional genes. 

For sexual reproduction, this is not necessarily the case, since the haploids produced by a 
diploid cell with two non-functional genes in a given homologous pair may fuse with haploids 
produced by a diploid cell containing functional copies of the gene in the same homologous 
pair. This means that the resulting daughter diploid can have a corresponding homologous 
pair with one functional and one non- functional copy of the gene (see Figure 8). This breaks 
up the association between two defective genes in a given homologous pair, preventing the 
steady accumulation of non-functional homologous pairs that can occur with the non-sexual 
pathways. 

The explanation is a bit more involved than the basic mechanism given above, however, 
since sexual reproduction with the two-chromosomed genome (i.e. no recombination) has 
a large N mean fitness that approaches the mean fitness of the non-sexual reproduction 
strategies. 

In the absence of recombination, a given chromosome cannot reduce the number of de- 
fective genes. Once > ln2, then e"^ < 1/2, which means that when a given chromosome 
replicates and produces two daughter chromosomes, on average less than one of those daugh- 
ters will be identical to the parent. Since semiconservative replication effectively destroys 
the original parent DNA molecule, the result is a steady accumulation of mutations that 
leads to loss of viability due to genetic drift. The ability for sexual reproduction to break 
up associations between defective genes in a homologous pair may lead to a mean fitness 
that is larger than the mean fitness of the non-sexual strategies for finite N. However, as 
N becomes large this effect steadily disappears, and the result is that sexual reproduction 
in the absence of recombination has no selective advantage over non-sexual reproduction 
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strategies. 

In the case of sexual reproduction with the multi-chromosomed genome, recombination 
allows for the production of daughter cells with fewer defective genes than were present in 
the parent. In the limit of large N, this effect washes out any mutation accumulation effect 
for any value of /i. To understand this, we first note that, in the limit of large N, a given 
genome will have a number of defective genes that scales as y/N. 

To see this, we note that the probability that two haploids, each having n functional genes, 
share at least one position where both genes are non- functional, is given by 1 — (^~") / (^) ■ 
Using Stirling's Formula, it may be shown that, in the limit of large N, this probability is 
1/2 when n is on the order of y/N. As a result, haploid fusion will lead to a loss of fitness, 
and therefore the purging of deleterious mutations, when the number of non-functional 
chromosomes in a genome is on the order of y/N. 

The defective genes, along with all of the non-defective genes in the genome, segregate 
themselves among four haploid cells, so that each haploid on average has half the number 
of defective genes in the original parent (which is a number that still scales as \/N). By 
the nature of the binomial distribution, the standard deviation for the number of defective 
genes in a given haploid scales as N^^*. Therefore, out of the four haploids produced, it 
may be shown that two will have on the order of A^^/^ fewer defective genes than would 
be expected from a purely symmetric re-distribution of genes, and two will have on the 
order of N^^^ more defective genes than would be expected from a purely symmetric re- 
distribution of genes. Thus, on average, there is no net accumulation of mutations in the 
population. Although each replication cycle introduces an average of /i mutations per N 
template strands from each gene, this effect is washed out by the N^^^ fiuctuation in the 
number of defective genes in the daughter cells due to recombination. While this effect is not 
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strong enough to prevent a decrease in the mean fitness as // increases, it is strong enough 
to give a significant advantage to sexual reproduction over other reproduction strategies as 
q; — > 1, and to ehminate the error threshold for a > 0. 
This washing out effect is illustrated in Figure 9. 

B. Recombination and the evolutionary basis for the meiotic pathway 

An interesting feature of meiosis, the process by which a diploid cell produces four hap- 
loids, is that the first diploid division is essentially characterized by rj = 1, using the notation 
of this paper. The reason for this is that, during the first stage of meiosis, a given chro- 
mosome replicates, and the two daughter chromosomes remain paired together. The two 
homologous pairs of daughters then line up with one another, during which recombination 
can occur, after which each pair of daughter chromosomes segregate into distinct cells. 

We offer the following simple explanation for this segregation mechanism: If the homolo- 
gous pairs of daughters line up in the first stage of meiosis, then, in the second stage, where 
haploid production takes place, the homologous pairs no longer need to find each other, 
since they are already connected. Thus, this haploid production pathway only requires each 
homologous pair of chromosomes to line up with one another in the original parent diploid 
cell. If the daughters of a given parent were not to co-segregate, then each homologous pair 
would have to find one another in each of the two daughter diploids, in order to properly 
form four haploid cells with the haploid complement of genes. This second pathway requires 
twice the number of homologous pair alignments, which takes additional time and energy 
over the first pathway. 

Furthermore, during meiosis, crossover between the homologous pairs occurs, leading to 
an exchange of genes between the homologous pairs, a process known as meiotic recombi- 
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nation. Meiotic recombination essentially ensures that, although each chromosome contains 
numerous genes, the segregation of genes is such that the genes on a given chromosome may 
be derived from either of the two parent chromosomes. The result is that meiotic recombi- 
nation leads to a gene segregation pattern that most closely approximates the segregation 
pattern for the multi-chromosome, sexual pathway considered in this paper. 

C. Sexual reproduction as a stress response in S. cerevisiae 

It should be noted that the results for the sexual reproduction pathways were obtained in 
the limit where 7p — > oo, that is, where the time cost for sex may be assumed to be negligible. 
For finite values of 7p the value of R. will be reduced. This suggests why unicellular organisms 
such as S. cerevisiae engage in a sexual stress response. When conditions are such that the 
fitness is high, then the relative value of jp is small, i.e., the characteristic time a haploid 
spends searching for a mate with which to fuse is large compared to the characteristic 
doubling time, and so the fitness benefit of sex does not outweigh its cost. However, under 
stressful conditions, the fitness may drop to values where the characteristic haploid fusion 
time is small compared to the characteristic doubling time, and so the fitness benefit for sex 
outweighs the costs. For more complex, slowly replicating organisms, it is possible that the 
cost for sex is almost always sufficiently small to keep sex as the optimal strategy. This, 
however, is highly species-dependent, since many classes of organisms are able to reproduce 
both asexually and sexually. 
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D. Sexual reproduction and the error catastrophe in complex, multicellular or- 
ganisms 

One of the interesting results of our models is that the sexual reproduction pathway for 
the multi-chromosomed genome, in contrast to the other reproduction pathways considered 
in this paper, does not appear to exhibit any kind of error threshold where the mean fitness 
of the population reaches at some critical mutation rate and remains there. For unicellular 
organisms, such as S. cerevisiae, where /i is on the order of 0.01, a non-sexual reproduction 
strategy will not lead to the loss of viability in a population, since this value of /j, is far below 
the critical value of In 2 fa 0.69. In this case, then, S. cerevisiae does not need to reproduce 
sexually in order to survive, though sexual reproduction, when it is not too costly, does 
provide an additional fitness boost, and so it makes sense for the organism to maintain the 
pathway in its genome. 

However, for more complex, multicellular organisms, the value of /j, can greatly exceed 
In 2. For humans, for example, the value of /i per replication cycle is on the order of 3 
(and it is higher if we count jj, to be the average number of point mutations by which the 
gamete genomes of a human differ from the original fertilized egg from which the human was 
produced). Although the sexual reproduction pathways considered in this paper were for 
unicellular organisms, the results in this paper nevertheless suggest that sexual reproduction 
is necessary in more complex organisms to prevent the steady accumulation of mutations and 
the loss of viability of the population. While research explicitly modeling asexual and sexual 
reproduction pathways in multicellular organisms is necessary, it is nevertheless interesting 
to note that the production of gametes in multicellular organisms follows a similar meiotic 
pathway to the one that occurs in S. cerevisiae. 
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E. Masking of deleterious genes, sexual reproduction, and diploidy 

One theory for the advantage of sexual reproduction is that it allows for the restoration 
of a wild-type genome by pairing a defective gene in one haploid with a functional gene in 
another haploid. While this "masking effect" has been discussed above, this paper is not the 
first to advance it. This is in fact a relatively old theory to explain the selective advantage 
for diploidy and for sex. However, previous mathematical analyses led to the rejection of 
this theory for the existence of diploidy and sex, while the analysis in this paper shows that 
this masking effect can indeed occur in diploid, sexually reproducing organisms. 

Previous research on sex and diploidy regarded the ability to mask mutations as a func- 
tion of diploidy only. The idea was that sex obtained its selective advantage by making 
use of the masking ability that diploidy presumably confers, thereby providing a selective 
advantage to the strategy. The idea that sex itself was not necessary for the masking ef- 
fect led researchers to first study the hypothesis that diploidy provides a masking effect 
in asexually reproducing organisms. However, as is seen from our earlier analysis, with 
standard asexual reproduction without mitotic recombination (rj = 0), at steady-state the 
mutation-accumulation is such that every homologous pair has at least one non-functional 
copy of a given gene. While it is true that these non-functional genes may be masked by a 
functional copy in the homologous pair, there is no apparent advantage over haploidy in this 
case. Furthermore, the haploids produced from such diploid cells would contain a number 
of defective genes that is proportional to the haploid complement of N genes, so that hap- 
loid fusion would produce diploids with a number of homologous pairs lacking a functional 
copy of a given gene that scales with N, leading to diploids of essentially fitness, thereby 
eliminating any selective advantage for sex. 
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Indeed, in this paper, we have found that diploidy without sexual reproduction with 
recombination does not provide any advantage over haploidy, given the fitness landscapes 
considered in this paper (asexually reproducing haploids would also yield a mean fitness of 
max{2e~'^ — 1,0}). Diploidy only has an advantage over haploidy when coupled to sexual 
reproduction with recombination. The reason for this is that sexual reproduction leads 
to a \/N scaling in the number of defective genes in the genome, making the masking 
effect provided by diploidy possible, since the fraction of deleterious genes goes to zero 
with increasing genome size. Combined with recombination, which washes out mutation- 
accumulation effects (something that is only possible if the number of deleterious mutations 
is much larger than the average number of mutations per replication cycle) , the result is the 
elimination of the error catastrophe and a selective advantage for sexual reproduction over 
non-sexual forms of reproduction. 

It should therefore be apparent that the selective advantage for sexual reproduction iden- 
tified in this paper shows a very strong connection between diploidy and sexual reproduc- 
tion. Without sexual reproduction, diploidy provides no fitness benefit over haploidy with 
the landscapes considered in this paper. Conversely, without diploidy, sexual reproduction 
only provides a selective advantage under relatively restrictive and problematic assumptions. 
With diploidy, however, we have shown that sexual reproduction can provide a fitness benefit 
over other reproduction strategies. 

This analysis suggests that sexual reproduction and diploidy should have evolved to- 
gether. However, this seems unlikely, since if these strategies are only advantageous when 
present together, it appears that the chances that both would randomly evolve simultane- 
ously is negligibly small. However, because diploidy provides a mechanism for genetic repair 
via homologous recombination repair, we argue that diploidy does have an important selec- 
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tive advantage that is not connected to sex, at least in more slowly reproducing organisms 
for which repair of the genome is more important. 

One possibility is that diploidy evolved before sexual reproduction, so that sexually repro- 
ducing organisms evolved from asexually reproducing diploid organisms. Another possibility 
is that a form of haploid sex evolved first, whereby two haploid organisms temporarily fused 
to form a diploid organism. The purpose of this fusion was to allow for homologous re- 
combination repair in each of the haploid genomes. Once homologous recombination repair 
was complete, the diploids would divide to form four haploids. At this point, the pur- 
pose of sex would simply be to recover the asexual mean fitness that would exist without 
double-stranded damage in the haploid genomes. This hypothesis is consistent with recent 
experimental work on the multicellular green algae Volvox carteri (Nedelcu et al. 2004). 
However, in time, the benefits of diploidy would have caused it to evolve into the dominant 
state of the organismal life cycle, making it possible for sexual reproduction to provide a 
population mean fitness that exceeds that of non-sexual reproduction reproduction strategies 
in both the haploid and diploid states. 

F. Speculations on the evolution of mitotic recombination 

An important issue connected to the evolution of sexual reproduction is the issue of mi- 
totic recombination, since mathematical models with a different set of assumptions than 
the ones considered here have found that mitotic recombination can often provide an al- 
most identical advantage to sexual reproduction (Mandegar and Otto 2007). The apparent 
discrepancy is that here, we do not assume that a homologous gene pair with a single non- 
functional copy of a gene leads to a fitness penalty, whereas other models do make this 
assumption. If the fitness landscape considered in this paper is closer to the fitness land- 
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scapes of actual genomes, then our modeling suggests that mitotic recombination is simply 
not worth the additional time and energy costs involved in finding the homologous pair in 
the cell nucleus. 

Nevertheless, mitotic recombination does occur on occasion. The likely explanation is 
that, while the vast majority of genes in diploid genomes are such that only one functional 
copy is needed to achieve the wild-type fitness, there may be a few genes where there is a non- 
negligible fitness penalty for having even one non-functional copy of a gene in a homologous 
pair. If this fitness penalty is small, then once again it may not be worth the time and 
energy to engage in mitotic recombination. If this fitness penalty is large, then in any event 
genomes with a non-functional copy of the gene will be purged from the population, so that 
mitotic recombination may not be necessary. However, for intermediate values of the fitness 
penalty, it is possible that mitotic recombination is worth the time and energy costs. 

While this discussion on mitotic recombination is speculative at this stage, it should be 
noted that it is known that certain genes are more prone to mitotic recombination than 
others. It is likely that the genes more prone to mitotic recombination are exactly those for 
which mitotic recombination would provide a fitness benefit. 

VII. CONCLUSIONS 

This paper analyzed the evolutionary dynamics associated with three reproduction path- 
ways in unicellular organisms: (1) Asexual reproduction, including mitotic recombination. 
(2) Self-fertilization with random mating. (3) Sexual reproduction with random mating. In 
addition, we considered two different forms of genome organization, to study the effects of 
recombination on the mean fitness for the various reproduction pathways: We considered 
a two-chromosomed genome, whereby the haploid complement of genes was all on a single 
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chromosome, and we also considered a multi-chromosomed genome, where each gene defined 
a separate chromosome, so that the distinct homologous pairs could segregate independently 
of one another. 

We assumed that the purpose of diploidy is to provide genetic redundancy, in particular 
by allowing for the repair of genetic damage due to various mutagens, radiation, and envi- 
ronmental free radicals. It was assumed that the fitness of a wild- type organism is 1, and 
that the fitness is unaffected as long as the organism has at least one functional copy of 
every gene. More generally, we assumed that a genome with I homologous pairs lacking a 
functional copy of a given gene has a fitness of ki, where 1 = > > • • • > k^o = 0. 

We found, for the asexual, self-fertilization, and sexual, two-chromosomed pathways, that 
the mean fitness at mutation-selection balance converges to max{2e~^ — 1, 0} as — >^ oo, 
where /i is the average number of mutations per haploid complement of template gene strands 
per replication cycle. This result holds independently of the extent of mitotic recombination 
or the organization of the genome. However, for the sexual reproduction pathway with the 
multi-chromosomed genome, we found, assuming a multiplicative fitness landscape defined 
by Ki — a'', that the mean fitness at mutation-selection balance exceeds the mean fitness of 
the other reproduction pathways. This fitness increase is larger the closer a is to 1, while 
for q; = we do not obtain a selective advantage over the other reproduction pathways. 

It must be emphasized that the results of this paper do not make any assumption re- 
garding population size, nor is it necessary to assume a dynamic fitness landscape (either 
induced environmentally or due to co-evolutionary dynamics in the case of the Red Queen 
Hypothesis). Furthermore, in contrast to the Deterministic Mutation Hypothesis, we do not 
need to assume that > 1, nor do we need to assume synergistic (negative) epistasis. In- 
deed, we only explicitly considered the multiplicative fitness landscape in this paper, which 
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does not exhibit any epistasis. However, we conjecture that our results will hold more gen- 
erally. In any event, we believe that the multiplicative fitness landscape considered in this 
paper is a more "generic" landscape that more closely approximates the fitness landscapes 
of actual organismal genomes. Essentially, this landscape is obtained by averaging over the 
various fitness penalties associated with knocking out individual genes from the genome, 
and assuming a uniform fitness penalty for each knockout. 

Therefore, this paper developed mathematical models that provide a selective advantage 
for sex under more general and far less restrictive assumptions than previous studies. Given 
that the mathematical models developed here are more realistic than previous models, in that 
they explicitly take into consideration semiconservative replication, diploidy, and suggest an 
evolutionary basis for meiosis and meiotic recombination, we believe that the work described 
in this paper points to a much more satisfying and complete resolution of the question of the 
maintenance of sexual reproduction in diploid organisms, as compared with previous work. 

In this vein, we should point out why we believe that the Deterministic Mutation Hy- 
pothesis and other explanations for the existence of sex require a number of seemingly overly 
restrictive assumptions in order to obtain a selective advantage for the sexual reproduction 
strategy. The basic reason is that previous models for sexual reproduction ignored the role 
of diploidy. Thus, the standard model that was used to analyze sexual reproduction is the 
following: Two parent haploids produce a daughter by contributing copies of their genes. 
The basic mechanism is that for each gene, the daughter receives a single copy from one 
of the parents, so that a given parent has a 50% chance of contributing a given gene to 
the daughter (see Figure 10). While this mechanism in principle allows for the restoration 
of the wild-type genome from two defective parents, in practice each parent contributes an 
average of half of their defective genes to the daughter, so that, on average, the daugh- 
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ter has as many defective genes as the parents. Furthermore, because we are deahng with 
haploid genomes, once a daughter receives a defective gene, it cannot receive a functional 
copy of that gene from the other parent and thereby "cover" the mutation. Also, in diploid 
organisms reproducing sexually, we showed that the average number of defective genes per 
genome scales as \/N, which, combined with recombination, leads to fluctuations on the 
order of N^/'^ that wash out any mutation-accumulation effects. With haploid genomes, the 
average number of defective genes per genome is a finite number that does not scale with 
A^, and so the fluctuations do not wash out any mutation-accumulation effects. 

These various effects, put together, means that, for sexual reproduction to have a se- 
lective advantage in haploid organisms, it is necessary to introduce additional restrictive 
assumptions that are not necessary if diploidy is taken into account. 

Thus, we have argued that in cases where predominantly haploid organisms engage in sex- 
ual reproduction (generally as part of a stress response), then this is in order to temporarily 
form a diploid organism for the purposes of engaging in homologous recombination repair. 
Given the previous work on sexual reproduction with haploid organisms, it is likely that sex 
in this context increases the mean fitness to its value in the absence of double-stranded DNA 
damage. As mentioned in the Discussion, the mean fitness can only be increased further 
with true diploidy and sexual reproduction with recombination. 

We should also point out that, in stochastic simulations of the various models we have 
considered in this paper, we have observed the finite population, Hill-Robertson effect, lead- 
ing to a reduction in the mean fitness of the population beyond what would be expected in an 
infinite population model. It is also true that this effect was smallest for sexual reproduction 
with the multi-chromosomed genome (the only case for which we provided results from the 
stochastic simulations), corroborating previous results by different authors (Keightley and 
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Otto 2006). However, the extent of the Hill- Robertson effect is strongly dependent on the 
value of a: The closer a is to 1, the stronger the effect. This being said, we have found that 
the Hill- Robertson effect is only appreciable at larger values of where the cutoff for "large" 
decreases with increasing a. Furthermore, by increasing the population size sufficiently, the 
Hill-Robertson effect can be essentially ehminated. For a. — 0.5, we have found good agree- 
ment between the infinite population results and stochastic simulations for a population size 
of 20, 000, where we considered asexual reproduction without mitotic recombination for the 
multi-chromosomed genome. 

In any event, in previous studies using haploid models for sexual reproduction, the dif- 
ference in fitness between the sexual populations and the asexual populations disappears 
once the population size is sufficiently large. In this work, we find that, by considering the 
role of diploidy, sexual reproduction in the multi-chromosomed genome retains a selective 
advantage over the other reproduction pathways in the infinite population limit. This is 
a significant result, for, as mentioned before, it suggests that sexual reproduction has a 
selective advantage under far less restrictive conditions than previous models indicate. Con- 
sequently, this result also provides an explanation for the persistence of sexual reproduction 
in populations that are not sufficiently small for the Hill-Robertson effect (or other finite 
size effects such as MuUer's Ratchet) to be relevant. Given how small unicellular organisms 
are, many of which are nevertheless capable of reproducing sexually, and given that there 
are approximately 7 x 10^ humans on the planet, such populations may in fact be fairly 
common. 

The results of this paper do not explain why a large variety of sexual and mixed asexual- 
sexual strategies are observed (e.g. male-female body size, the sex ratio, male parental care 
versus lack thereof, sperm storage, etc.). While these complex issues are left for future 
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work, the models presented in this paper nevertheless suggest a basic advantage for sexual 
reproduction that is at work in slowly reproducing, complex organisms. The specific form 
that the sexual strategies take may then depend on other parameters that are connected to 
the specific environmental niche that the given species inhabit, and the particular survival 
strategy that is employed. 

As a final note, we are aware that many plant species are not diploid, but contain addi- 
tional copies of their genes (e.g. tetraploid). Future research on the evolution and mainte- 
nance of sex will need to model these organisms, though we suspect that the basic mechanism 
for the advantage of sex obtained by considering diploidy will persist when considering these 
more complex genomes as well. 

Acknowledgments 

This research was supported by a Start-Up Grant from the United States - Israel Bina- 
tional Science Foundation, and by an Alon Fellowship from the Israel Science Foundation. 



54 



APPENDIX A: DERIVATION OF THE EVOLUTIONARY DYNAMICS EQUA- 
TIONS FOR ASEXUAL REPRODUCTION 



1. Two-chromosomed genome 

The dynamical equations governing the evolution of the asexually replicating, two- 
chromosomed unicellular population, are given by. 



= -«{ai,(72}^{ai,a2} + 2^ I^W'^TmTiy^} ^ 

^^Yl ^Pi^'l^ (^u)Pi(^'l, Cr'l2)pi(^2, <72l)p(f^2, f^22) X 
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+ 2 (1 ~ n)(Vll><^i2}.{<^l.<^2} + <^{^;2,<Tii},{ai,(T2})] (Al) 



where 5{ai,<72},{a3,<74} = 1 if Wi,<^2} = {0-3,0-4}, and otherwise. 

The above equation may be expanded into separate terms, which may then be collected 
and simplified to give, 

dn{ai,a2} . 

= -l^{<7u<T2}n{ai,a2}+ri f^{<T[,<7'^}1^{a{,a'^} X 
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Converting to the ordered strand-pair representation we have, for a^-. 



dn. 



(o-l,0-2) 



X 



\p{(^'l, (^l)p{(^'l, (^2) +P((72, (7i)p((72, (72)] 

+2ri ft;(,T',(T')^(a',cT')P((^'> (^2) 

{a',<x'} 

+2(1 «(a',a')^(<T',<T')P(cr'> Cr2) 

{a', a'} 

-'«(ai,a2)^(<7i,<72) + 2ri ^ /t(^^,^/)n(^^,^^)p(cTi,ai)p(ai,a2) 
+2(1 - n) fi;(^i,a^)n(^i,^^)p(ai, ai)p(cT2, (12) 



We also have, 



— ^ = -l^{a,a)n^a,a) + 2r, «:(^, ,^.) 

+2ri ^ K(^/,^/)n((,/,<,/)p((7', C7)p(c7', cr) 

{o-',cr'} 

+2(1 - Ti) ^ /«K,(T0^K,ai)b(<^i> (7-)p((72, (j) +p((J2, (t)p{(t[ 

{0-^,0-2}, O-i^tTj 

+2(1 - Ti) ^ H(a',a')n{a',a')P{cr', cr)p{a' , a) 

{a',a'} 

+2(1 -n) /t(<,^,^^)n(^/,^gp((T;,a)p(a^,c7) 
K,4) 
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and so, converting from population numbers to population fractions, we obtain. 




— ('^^(<T1,<T2) + '^(^))^(CTl,a-2) 



+2(1 -Ti) K(^/,^gX(^j,^gp((7;,(7i)p((72,(72) 



(A5) 



where x^ai,a2) = rna,,a2)/{n = EK,4)^K,^y)' and K{t) = {l/n){dn/dt) = 

E(o-l,0-2) '^(0-1,0-2)^(0-1 ,0-2) • 

To convert this to a set of equations in terms of the -Ziio,«oi,«oo population fractions, we 
proceed as follows: Given a daughter ordered strand-pair (cri,(T2) characterized by the pa- 
rameters /io,^oi)^oO) a-nd given a parent ordered strand-pair (cr'ijCri^), we 
the number of positions where ai is ii, (T2 is ^2, (j[ is ji, and (T2 is j2- We then have. 



P 




'hioi +'iioo+'iooi -f' 1000 ,0 



P{(^'l:(^2) 



P 




^1011+'l010+'0011+'0010|J 



''1101 +'1100 +'0101 +'0100 ,0 



P(0-2>f^2) 



p 




hiio-f 'iioo+'oiio+'oioo,0 



(A6) 



Taking into account degeneracies, we then have. 
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X 



dt - ^ i^^UoiUooKN - ho - /oi - /oo)! 

N—ho~loi—loo N—ho — loi—loo — hiio N—ho—loi — loo—hiio—hioi ho ho— 'loio 'lo — 'loio — 'looi 

EE E E E E 

'iiio=0 iiioi=0 'iioo=0 ^1010=0 iiooi=0 'iooo=0 

^01 ^01— 'oiio 'oi— 'oiio— 'oioi 'oo 'oo— 'ooio 'oQ— 'ooio— 'oooi 

^ ] ^ y ^ y ^ y ^ y ^ ] '^'iioo+'iooo+'oioo+'oooo ^ 

'oiio=0 /oioi=0 'oioo=0 'ooio=0 Zoooi=0 'oooo=0 

'^'lllO+'lOlO+'oilO+'oOlO.illOl+'lOOl+'oiOl+'oOOl .illOO+'lOOO+^OlOO +'()()()() 

( ^ \ /'^— 'iiio— 'lolo— 'oiio— 'oolo'j Z^- 'iiiQ- 'lolo— 'oiio— 'oolo— 'iioi— 'lool— 'oloi— 'oool'' 

v'liio+'ioio+'oiio+'oolo-' V 'iioi+'iooi+'oioi+'oool ' ^ 'iioo+'iooo+'oioo+'oooo 

N — lio — loi — loo\ I N — lio — Iqi — Iqq — Ziiio\ f N — Iiq — Iqi — Zoo ~ ^iiio ~ ^iioi 

^1110 / V ^1101 / V ^1100 

^10 \ fho — hoio\ I ho — how — hooi\ f hi \ (hi — hiio\ (hi — hiio — hioi 
hoioj \ hool J \ ^1000 / \hiioJ \ hioi J \ hioo 
ho \ fho — hoio\ (ho — hoio — hooi \ ^ 

^0010/ \ ^0001 / \ ^0000 / 

„'iiii+hiio+'ioii+'ioio n ^Voiii+'oiio+'ooii+'ooio X ^ 
P \^~P) "/iioi+'iioo+iiooi+'iooo,0 X 

„'iiii+'iiio+'oiii+'oiio n ^Vioii+'ioio+'ooii+'ooio A 

P \^~P) "hioi+hioo+'oioi+'oioo.O 

" ^'^/io!/oi!/oo!(iV-/io-/oi-/oo)! 

N— ho — hn— loo N—ho — hn—loo~hiio Af— ho— 'oi — 'oo— 'iiio— 'iioi 'lo 'lo— 'loio 'lo — 'loio — 'looi 

EE E E E E 

'iiio=0 'iioi=0 'iioo=0 'ioio=0 'iooi=0 'iooo=0 

'oi 'oi— 'oiio 'oi— 'oiio— 'oioi 'oo 'oo— 'ooio 'oo— 'ooio— 'oooi 

^ ] ^ ] ^ ^ ^ ^ ^ ] ^ ^ '^'iioo+'iooo+'oioo+'oooo ^ 

'oiio=0 'oioi=0 'oioo=0 'ooio=0 'oooi=0 'oooo=0 

'^'iiio+'ioio+'oiio+'ooiOi'iioi+'iooi+'oioi+'oooi ,'iioo+'iooo+'oioo+'oooo 

/ JV" \ /AT— (iiio— 'lolo— 'oiio— 'ooio~\ /AT— /iiio— /lolo— 'oiio— 'ooio— 'iioi— 'lool— 'oloi— 'oooi^^ 

V'liio+'ioio+'oiio+'oolO'' ^ 'iioi+'iooi+'oioi+'oool ' ^ 'iioo+'iooo+'oioo+'oooo ' 

N — ho — hi — ho\ I N — ho — hi — ho — hiio\ f N — ho — hi — ho — hiio — hioi 

hiw J \ hioi J \ 'lloo 

^10 \ iho~ ^1010 \ (ho — holo — ^iooi\ / ^oi \ (hi — ^oiio\ (hi — hiio — hioi 



hoioJ V ^1001 / \ ^1000 / \hiioJ \ ^0101 / \ ^oioo 

^00 \ (ho — hoio\ (ho ~ ^0010 — ^oooi\ ^ 

^0010/ V ^0001 / V ^0000 / 

„'iiii+'iiio+'ioii+'ioio n ^Voiii+'oiio+'ooii+'ooio A s/ 
P \^~P) "'iioi+'iioo+'iooi+'iooo,0 X 

„'iiii+'iioi+'oiii+'oioi n ^Vioii+'iooi+'ooii+'oooi A 

P {^~P) "'iiio+'iioo+'oiio+'oioo,0 
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After some manipulations, we obtain that, 



N—lio—loi—loo lio loi loo 'oo~'ooio ^00— 'ooio~'oooi 
'iiio=0 iioio=0 ioiio=0 iooio=0 ioooi=0 /oooo=0 



y (^1110 + ^1010 + ^0110 + Wo)! /-. _ x2Zinor.n - AV^oio\^(-i _ .^l^ollo. 

tlll0!H010!t0110!f0010! 

{N — /mo — /loio — /quo — /qoio — /qoqi — /qooq)! ^ 

(^10 ~ ^ioio)!(^oi ~ ^oiio)!(^oo ~ ^0010 ~ ^oooi ~ ^oooo)!(-^ ~ ^lo ^ ^oi ~ ^oo ~ /iiio)! 

^^(^\ e)]'^" [e(l g^j'oi-'oiiOg2(/oo— 'ooio— 'oooi— 'oooo) ^2 gj2(A''— Zio— ioi— ^00— hiio) 

ho loi loo ^00— 'oolo ^00— 'ooio— 'oooi 

+2(-'- ^i) ^ y ^ y ^ y ^ y ^ y '^'oooo^'ioio+'ooiOi^oioi+'oooii'oooo 

'ioio=0 ioioi=0 /ooio=0 ioooi=0 ioooo=0 

(/1010 + Wo)! ^^ _ ^yioio^/ooio (Wi + Wi)! ^-^ _ ^^|^olol^^oool ^ 
^ioio!^ooio! ^oioi!^oooi! 

(A^ — /loio — /qioi — /qoio — ^0001 — /oooo)! 

(^10 ^ioio)!(^oi ~ ^oioi)!(^oo ~ ^0010 ~ ^0001 "~ ^oooo)!(-^ ~ ^lo ^ ^oi ~ ^oo)! 

^£(^1 e)]^'^""'^"^" [e(l £jj'oi—'oioig2(ioo—'ooio—^oooi— '0000)^2 g'^2(7V— Zio— 'oi— ^oo) 



which is equivalent to Eq. (1). 

2. Multi-chromosomed genome 

To derive the evolutionary dynamics equations for the multi-chromosomed genomes re- 
producing asexually, we label each of the daughter cells from a given parent as a "left" cell 
and a "right" cell. We then first wish to determine the probabihty that a given daughter 
cell, either left or right, has a particular genome. Since the homologous pairs segregate 
into the daughter cells independently of one another, we may compute the probability of 
a given segregation pattern for each homologous pair, and then multiply the appropriate 
probabilities together for a given daughter genome. 

For this analysis, we will consider the left daughter cells only, since the arguments are 
analogous for the right daughter cells. Then, we wish to compute the probability p(rs — > xy), 
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where rs, xy — 11, 10, 00, which is the probabihty that a homolgous pair where one gene is of 
type r and the other gene is of type s produces the homologous pair xy in the left daughter 
cell. We handle each case in turn: 

11 — > 11 : Since each daughter chromosome is the daughter of a 1 parent, the probabihty 
that a given daughter chromosome is 1 is p, so the probability that both are 1 is p^. 

11 — > 10 : The probability that a given daughter chromosome is 1 is p, and the probabihty 
that a daughter chromosome is is 1 — p. Since it does not matter which daughter is 1 and 
which is 0, we obtain an overall probability of 2p(l — p). 

11 — > 00 : The probability for this pathway is (1 — p)^. 

10 — > 11 : The parent always forms two daughters, while the 1 parent may form either a 
11, 10, or a 00 daughter pair. In order to form a 11 daughter cell, the 1 parent must produce 
all daughter pair, which occurs with probability p^. Furthermore, the two 1 daughters must 
co-segregate. Since they are derived from the same parent, this occurs with probability rj. 
Finally, the two co-segregating 1 daughters must co-segregate into the left cell, which occurs 
with probability of 1/2. The overall probability is then rip^/2. 

10 — > 10 : If the 1 parent forms two 1 daughters, then the two 1 daughters cannot co- 
segregate, for otherwise this would produce a 11 pair in one cell and a 00 pair in the other 
cell. So, we want each 1 to co-segregate with a derived from the other parent gene, which 
occurs with probability 1 — r^. The probability of this particular segregation pattern is 

(1 - n)p\ 

The 1 parent forms one 1 and one daughter with probability 2p(l — p). This produces 
a 10 pair in one cell, and a 00 pair in the other cell, so the probability that the left cell 
receives the 10 pair is 1/2, giving an overall probability of p(l — p). 
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Adding the probabilities together, we obtain an overall probability of p(l — Vip). 



10 — > 00 : The probability for this pathway is 1 — p(l — rjp) — = 1 — p(l — rjp + rjp/2) = 

l-p(l-r,p/2). 

00 00 : The probability for this pathway is 1. 

Given a daughter diploid characterized by the parameters Iiq, Iqq, and given a parent 
diploid, let /iij2jij2 denote the number of homologous gene pairs where the daughter is ii,i2 
and the parent is Ji,j2- The probability that the parent diploid produces the daughter 
diploid as the left daughter is, 

/'""[2p(l -p)]'-"(l -p)2'-"(^p2ynio[p(i - r,p)]'--[l -p(l - ^p)]'°°^°5;noo+/iooo,0 

(A8) 

Taking into account degeneracies, we obtain that the evolutionary dynamics equations 
are then. 



N—lio—loo N— ho — loo— lino ho ho— 'loio 'oo 'oo- 'ooio 

Yl Yl J2 J2 J2 '^^^ 

loo+'iooo+'oooo 

hiio=0 hioo=0 hoio=0 hooo=0 iooio=0 /oooo=0 

'^hllO+hoiO +^0010 -hlOO+hoOO +^0000 ^ 

/ W \ /jV-/iiio-iioio-'ooio'\ 

vhiio+hoio+^0010'^ V 'iioo+'iooo+'oooo ' 

N — ho ~ loo\ ( N — ho — loo ~ hiio\ ( ho \ (ho — hoio\ ( loo \ (loo — ^ooio 



^1110 / V ^1100 / V^ioioy V ^1000 / \looioJ \ ^oooo 



X 
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-'V— ^10— ^00 ^10 ^00 ^00 — ^0010 

'^'oooo'^'iiio+'ioio+'ooiOj'oooo 

^1110=0 ^1010=0^0010=0 ^0000=0 



tlll0!tlOlo!f0Olo! ^ ^ 

(A^ — hiio — ^1010 — /qoio — ^oooo)! ^ 

(^10 ~ ^101o)K^OO ~ ^0010 ~ ^oooo)K'^ ~ ^10 ~ ^00 ~ ^iiio)! 
|'2£^]^ g-^j'io— iioiOg^2(ioo— 'oolo— 'oooo)^2 g^^2(W— iio— 'oo— iiiio) 

(A9) 

which is identical to Eq. (15). 



APPENDIX B: DERIVATION OF THE EVOLUTIONARY DYNAMICS 
EQUATIONS FOR SELF-FERTILIZATION FOR THE MULTI-CHROMOSOMED 
GENOME 

To develop the evolutionary dynamics equations for self-fertilization with random mating, 
we proceed as follows: Given a parent diploid cell, we assume that it splits into a left diploid 
and a right diploid. The left diploid then splits into two haploids, haploid 1 on the left and 
haploid 2 on the right, while the right diploid also splits into two haploids, haploid 3 on the 
left and haploid 4 on the right. 

We then have the following pairings, all with equal probability because of random mating: 
(1) 1 ^ 2, 3 ^ 4. (2) 1 ^ 3, 2 ^ 4. (3) 1 ^ 4, 2 ^ 3. Each of the three possible pairing 
schemes have a probability of 1/3 of occuring. 

We may consider each pairing scheme in turn. Our goal is to determine, for a given 
parent diploid, what is the probability of obtaining a specific daughter diploid as the left 
daughter cell. 

We consider the various probabilities in order. 
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1 ^ 2,3 ^ 4 

11 — > 11 : If a homologous pair in the parent diploid is 11, then each daughter gene in the 
final left diploid is the daughter of a 1 parent. Since the probability that a given daughter 
of a 1 parent is itself 1 is p, the probability that both daughters are 1 is p^. 

11 — > 10 : As with the previous case, the probability that a given daughter of a 1 parent is 
itself 1 is p, while the probability that the daughter is is 1 — p. Therefore, the probability 
that a given daughter of a 1 parent is 1 and the other daughter of a 1 parent is is p(l — p). 
Since it does not matter which daughter is 1 and which is 0, we obtain a total probability 
of 2p(l — p). 

11 — > 00 : The probability of this pathway is 1 — — 2p(l — p) = (1 — p)^. 

10 — > 11 : The probabihty that a 10 pair produces two 1 daughters and two daughters 
is p^. Since these two 1 daughters are from the same 1 parent, the probability that they 
co-segregate into the left diploid is rj/2, giving a total probabihty of riP^/2. 

10 — > 10 : The probability that a 10 pair produces 2 1 daughters and 2 daughters is p^. 
Since these two 1 daughters are from the same 1 parent, and since the two daughters are 
from the same parent, the only way to obtain a 10 left daughter cell is for the daughter 
chromosomes of a given parent to not co-segregate. Since this occurs with probability 1 — r^, 
we obtain an overall probability of (1 — rj)p^. 

The probability that a 10 pair produces 1 1 daughter and 3 daughters is 2p(l — p). 
Since the probabihty that the 1 chromosome ends up in the left daughter cell is 1/2, we 
obtain an overall probability of p(l — p). 

The total probability is then (1 — rj)p^ -|- p(l — p) = p(l — r^p). 

10 — > 00 : The probability for this pathway is 1 — riP^/2 — p(l — rjp) — 1 — p(l — rjp/2). 
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00 — > 00 : Because of the neglect of backmutations, this occurs with probabihty 1. 



1 ^ 3,2 ^ 4 

11 11, 10, 00: Following a similar line of reasoning to the one used above, we obtain an 
identical corresponding set of transition probabilities. 

10 — > 11 : The 1 parent must produce two 1 daughters with probabihty p^. These 1 daughters 
must segregate into distinct diploids, with a probability of 1 — rj. The probability that these 
1 then end up in haploids 1 and 3 respectively is 1/4, for a total probability of (1 — rj)p^/4. 

10 — > 10 : The 1 parent produces two 1 daughters with probability p^, while the parent 
produces two daughters with probability 1. If the 1 daughters and the daughters each co- 
segregate, which occurs with probability r,, then the 1 haploid and the 3 haploid will together 
form a 10 pair. If the daughters of each parent do not co-segregate, with probability 1 — r^, 
then we form two 10 diploids. The probability that the 1 haploid has a 1 and the 3 haploid 
a is 1/4, and the probability that the 1 haploid has a and the 3 haploid a 1 is 1/4, giving 
an overall probability of p^{ri + (1 — r,i)/2) = (1 + rj)p^/2. 

The 1 parent produces one 1 daughter and one daughter with probability 2p(l — p). 
The probability that this 1 daughter ends up in either haploid 1 or 3 is 1/2, for an overall 
probability of p(l — p). 

The total probability is then p[l — p -|- (1 -|- rj)p/2] = p[l — (1 — ri)p/2]. 

10 00 : The probability of this pathway is 1 - (1 - ri)p^/A-p{l - (1 - ri)p/2) = 1 - p[l - 

(i-n)p/4]. 

00 00 : The probability for this pathway is simply 1. 

1 ^ 4, 2 ^ 3 
64 



This case is symmetric to Case 2, so all of the probabilities are identical. 

Given a diploid parent and a diploid daughter cell, where the daughter is characterized 
by ^10) ^00) let 1^123132 denote the number of positions where the daughter is ^1,^2 and the 
parent is Ji,j2- The probability that the parent diploid produces the daughter diploid as 
the left daughter cell is then, 

for the 1 <-> 2, 3 <-> 4 mating pattern. 

p2/nn[2p(l _p)]'ion(l _p)2Wi(i^p2ynio[p(i _ l^^)]/ioio _ _ ^^^^^^^^^^^^^^^ 

for the 1 3, 2 4 and 1 4, 2 3 mating patterns. 

(Bl) 

Taking into account degeneracies and the probabilities for the various mating patterns, 
we obtain. 



X 



dt - ^'"'^ + ^W)ziroM + 'i^^UooKN - ho - Zoo)! 

N— ho— loo N—ho—loo—hiio ho ho—hoio loo 'oo— 'ooio 

E\ ^ \ ^ \ ^ \ ^ \ ^ ^^11 10 ,^1100 +^1000 +^0000 

2-^2-^2-^2-^ «'/iioo+hooo+/oooo"7 N \ /AT-iiiio-^ioio-'ooioN 

^1110=0 '1100=0 ^1010=0 /iooo=0 ^0010=0 /oooo=0 V'liio+'ioio+'ooio' ^ 'iioo+'iooo+'oooo / 

N — lio — loo\ ( N — liQ — Zoo ^ Ziiio\ / /lo A fho — ^loioA / Zqo \ (Iqo ~ Zooio 
Ziiio / \ him ) V^ioio/ \ Ziooo / VZqoio/ V ^0000 

^2 4 

^hioo+hooo,o] 
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2 N— ho— loo ho loo ^00—^0010 

= -{1^100 + '^{'t))ziio,loo + 3 2Z XI XI X] 

'iiio=0 ^1010=0^0010=0 ^0000=0 

(^1110 + ^1010 + ^ooio)! ^ 
^iiio'^ioio'^ooio' 

[[-(1 - e)2]'"io[l - e - ri(l - eff'°'°[e + -(1 - eff'°'° 

{N — Ziiio — /loio ~ ^0010 ~ ^oooo)! ^ 
(^10 — ^ioio)!(^oo — ^0010 — ^oooo)K-^ ~ ^10 ~ ^00 — ^iiio)! 

pg^]^ g^jho— hoiOg2(Zoo— ^0010— 'oooo) g^2(W— Jio— 'oo— iiiio) 



(B2) 



which is identical to Eq. (23). 



APPENDIX C: DERIVATION OF THE EVOLUTIONARY DYNAMICS EQUA- 
TIONS FOR SEXUAL REPRODUCTION 

1. Two-chromosomed genome 

For sexual reproduction with random mating, the dynamical equations are, 

-K'{ai,a2}n{ai,a2} + ( ^ ) "-(ti riaa , C^l 7^ Cr2 

1 7 

fi{a,a}n{a,a} + 2^yWa (CI) 



Jt 

dt 
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'*{o-l,<72}^{o-l,0-2} ^ 



dt 

5^X1X1 0-12)p(0-2, 0-2i)p((72, 0-22) X 

(Til <T12 <T21 Cr22 

1 \ — ^ 

= -( — )n^n// + 2 fi{ai,a2}'n{ai,a2}\p{'^l,'^)+P{'^2,(T)] 

{o"l,0"2} 

= -( — )n<,nH + 4 2^ «(ai,a2)^(ai,a2)b(^^l>^^) +P(^^2,0-)] 

{0'1,0"2},0'17^(T2 

+4 fi;((,/,(,/)n(<,/,a/)p((T', a) 

{<T',<7'} 

= -( — )n<,nH + 4 2^ fi;(ai,a2)^(ai,a2)P(t^l,t^) (C2) 

(l7l,<T2) 

Defining the diploid ordered strand-pair population fractions via a;(o-i,(T2) = '^(<ti,ct2)/^) 
and the haploid population fractions via — n^/{2n), we obtain, after converting from 
population numbers to population fractions, and using the fact that p = n/V, the dynamical 
equations. 



■^{t)Xa-2-fpXaXH + '2 ^ l^{aua2)X{ai,a2)P{'^l: ^r) 

(o-l>0-2) 



(C3) 



To develop the evolutionary dynamics equations in terms of the -Z/io,/oi,/oo ^lo-i 
proceed as follows: Given a haploid with genome cr, let h and Iq denote the number of 
positions where cr is 1 and 0, respectively. Given some (di, (T2), let lij^j2 denote the number 
of positions where a is i, cti is ji, and (T2 is j2- We then have, 

=p'-^+'-°(l -p)^°"+'°^°5iioi+/ioo,o (C4) 

The evolutionary dynamics equations for the diploid population fractions ^;io,;oi,ioo 
given by, 
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* ^ " "° ' 'io!/o.!/„o!(.V - U - /oi - Zoo)! (, J,J (, J,J 

/ , -/.\\ , n (^10 + ^Oo)K^01 + ^Oo)! 

= -(«ioo + i^WJ^hoMM + 27P ,,,,,, X 

(AT -Zq, -/„„)! (Ar-/,o-/oo)! 

/ , -/.\\ , o (^10 + ^oojK^oi + ^oo)! 

= -(/^/oo + '«(^))^/io,/oi,/oo + 27P , „ „ , X 

tl0!t0l!f00! 

/"TT N — liQ — Ipi — Iqq + fe w-TT 1 V 

TV - - Zoo + A; ^ ^^J^ N-I00 + ;t 

(C5) 



which is identical to the first equation in Eq. (28). 

Taking into account the transition probabihties and various degeneracies, then for the 
haploids, we have, 

hio=0 /ioi=0 hoo=0 ^010=0 /ooi=0 /ooo=0 



X 



X 



dt 

^hi(i+^oio,h()i+^ooi,/ioo+^ooo 

7 N \ ^N-liw-loio\ ^N-hiQ-h)w-hoi-looi\ 
Vhio+^oio' ^ 'loi+'ooi ' V hoo+'ooo ' 

N — lo\ ( N — Iq — /iio\ f N — Iq — /no — /ioi\ / ^0 \ f^o ~ ^oio\ mo ~ ^010 ~ ^001 
^110 / V ^101 / \ ^100 / V^oio/ V ^001 / V ^000 

N—lo lo 'o— 'oio 'oio— 'ooi 

= -R{t)ziQ - 2'ypzHZio + 2 ^ ^ ^ ^ i^iooo^hio+hioMiMo ^ 

'110=0/010=0 iooi=0 /()()o=0 
{hw + kwV- ^^ _ ^^^^no^^OlO (^-^110-/010-/001-/000)! ^;o_;„^„_;„„^_;„„„^^ _ ^^N-lo-hio 

/iio!/oio! (/q — /qig — /qoi — /ooo)!(-^ — — /no)! 

(C6) 



which is identical to the second equation in Eq. (28). 
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2. Multi-chromosomed genome 

To derive the quasispecies equations for sexual replication with random mating for the 
multi-chromosome case, we proceed as follows: We assume that a diploid produces four 
haploids that may be hned up and labelled "1", "2", "3", "4". We wish to determine what 
is the probability that haploid "1" receives a certain genome from a given parent diploid. 
As with the asexual case, since each of the homologous pairs segregate independently of one 
another, we may consider the probabilities of the various segregation patterns for a given 
homologous pair. We consider each case in turn. 

11 — » 1 : If a given homologous pair in a parent diploid is 11, then the corresponding gene 
in the daughter haploid labelled "1" is the daughter of a 1 parent, so the probability that 
this daughter is itself a 1 is p. Therefore, the 11 — > 1 probability is simply p. 

11 — » : Following a similar argument to the one given above, we obtain that the 11 — > 
probability is 1 — p. 

10 — > 1 : If a given homologous pair in a parent diploid is 10, then since a parent gene 
produces two daughters, the corresponding gene in the daughter haploid labelled "1" 
can only be 1 if it is the daughter of the 1 parent. By the symmetry of the chromosome 
segregation, the probability that the haploid gene is the daughter of the 1 parent is 1/2. 
Since the probability that a daughter of the 1 parent is itself a 1 is p, we obtain an overall 
probability of p/2. 

10 — » : Since the probability of a 10 —> 1 pathway is p/2, the probability of the 10 — > 
pathway is 1 — p/2. 

00 — > : The probability of this pathway is 1. 
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Suppose a diploid is characterized by the parameters ho, Zoo- Suppose that two haploids, 
with sequences cxi and (T2 fuse. If ai ^ (T2, then the diploid production rate is given by 
{'y/V)na-ina-2, while if ai — a-i, then the diploid production rate is given by (l/2)(7/y)no.^no.2- 

If we let a = {{su, Su}, ■ ■ ■ , {sm, SN2}) denote the genome of the diploid, where 
{sii, Si2} — {1, 1}, {1, 0}, {0, 0}, and if we let a' denote the genome formed by the fusion of 
haploids with genomes ai and (72, then we have, 

dria .7 , 1 7 2 

(o'i,o-2),o"'=(3- 

Now, where a is {1, 1}, we must have that both ai and (72 are 1. Where a is {0, 0}, we 
must have that both ai and (72 are 0. Where is {1, 0}, we must have that ai is 1 and (72 is 
0, or (7i is and (72 is 1. Let I denote the number of spots where ai is 1 and (72 is 0. Since 
we want the fusion of ai and (72 to produce a, then the number of spots where ai is and 

(72 is 1 is liQ — I. 

Taking into account degeneracies, and converting from population numbers to population 
fractions, we then have. 



dt 



/ , , r, (Z + Zoo)K^io ~ Z + ^oo)! 

= -(«ioo + l^{t))^lm,loo + 27P 2^ 7771 WTT-, X 

fe=l A;=l 

which is identical to the first equation in Eq. (41). 

To derive the haploid equations, suppose a haploid is characterized by the parameter Iq. 
Given some parent diploid, let lij^j^ denote the number of positions where the haploid is i 
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and the diploid is ji, j2- We then have a total transition probability of, 



and so, taking into account degeneracies, we obtain. 



J 1.7-1 N—lo N—lo—liio lo lo—loio 

^110=0 hoo=0 (oio=0 iooo=0 v'lio+'oiO'' ^ 'loo+'ooo ' 



U^n J V ^inn / \^oio/ \ ^ooo / 2 2' 

N—Iq Iq i()io 



jv— H) H) H) — H)W /7 I / I 1 1 _|_ 



110=0 Joio=0 'ooo=0 

g'o— 'OIQ— ^000 g-^-W— (q— (no (CIO) 



(N - I no - lolo - h}(K)V- io-ioio-iooon A^-io-hio 



{Iq — low ~ loooV-i-^ — lo ~ hioV- 
which is identical to the second equation in Eq. (41). 

APPENDIX D: DERIVATION OF THE DYNAMICAL EQUATIONS FOR 
wiWi,P2,t) FOR ASEXUAL REPRODUCTION IN THE TWO-CHROMOSOMED 
GENOME 

For asexual reproduction in the two-chromosomed genome, we have. 
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N-l N-l-h N-l-h-h h h I 1-141-14-15 



dt 



li=0 l2=0 l[=0 l'^=0 l's=0 1'^=0 l'g=Q 1'q=0 



iN-l[^l',-l',^l',-l',-l',)\ 

(^1 - I'.y.ik - I'M WN -l-h-h- /;)! 

[/?ie(l - e)f'-^(32e{l - e)]^^-%^y-''^~''^-''^[{l - e)2]Af-^-h-i2-/i 

N-l N-l-h h h I '-^3 '-'3-^4 

«(i-.)E E EEEE E 

^1=0 ^2=0 /;=o ii^=o /^-o 1'^=^ i'r=Q 
(A^ - /'l - /[, - /^j - /\ - /^)! 

{k - mk - mi -I3- 14-mN- i-k- ky. 



= -{ki + K{t))wi 

I N-l N-l-l[ N-l-l[-l'2 N-l-l[-l'2-l'3 N-l-l[-l!2-l'3-ili-l'2) 



+2^.EEEEE E E 

^.=01^=0 i'4=o i{=o i'2=o 13=0 h-l'2=0 

+ + + 



E 

l2-l'^=0 



-[(1 - e)^]''4Ml - e)]'HMl - e)f3(e')'^ X 



(N-k-l',-l',-l',-l',-Q\ 



X 



(k - mk - mi -I4-1',- mN -i-k-k- kv. 

; l-l'^ 1-1'4-l'r, N-l N-l-l{ N-l-l[-l'.^ N-l-l[-l!2-{li-l[) 

+2{i -r^)Y,Yl Yl Y '^v^^ww^'^ 

l'^=Ql'^=Q l'^=0 l'^=Q l'^=Q h-l'^=0 l2-l'2=0 

{N-k-k-l's-k-l'.V- 



ik - kv-ik - mi -13-14- mN -i-k- kv- 

[/3ie(l - e)]'i-''i[/32e(l - e)]'^-'2(e')'-'3-'4-'6[(i _ ^^^2^N-i-h-i2 
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N-l N-l-l[ N-l-l{-l'2 N-l-l[-l'2-l's N-l-l[-l'2-l's-{li-l'2) 

>-{ki + R{t))wi + 2riKi ^ ^ ^ ^ii+i'2+i'3fi,i X 

i[=o z^=o i'^=o ii-i^=o 12-13=0 



Ar_i Ar-Z-;^ N-l-l'^-l'^ N-l-l'^-V^-ili-l'^) 

+2(1 - r,)K, E E E E - ^)]'4/?2(l - 6)]^^ X 

Z^=0 Z^=0 h-l[=0 l2-l'2=0 

{N-l-l[-l',)\ 



{ii-mk-QKN -i-k-ky. 



[/?ie(l - e)]'i-'i[/32e(l - e)]'^-'^[(l - e)Y''-''-'' 



= ft:,[2((/3i+/32)6(l-6) + (l-e)2)^-'x 

-^(i)wK/9i,/92,i) (Dl) 
where strict equahty holds for / = 0, or if zi^^i^^i^ = for I3 < I. 

APPENDIX E: MATHEMATICAL DETAILS FOR THE SOLUTION OF THE 
SEXUAL REPRODUCTION PATHWAYS IN THE LIMIT OF LARGE N 

1. Two-chromosomed genome 

We begin our analysis by deriving the limiting form of the expression, 



(k + h)'-(k + (3)! -k-h-h + k '' 



11 N-L-U^h ll/V-Zo-^i• ' 



imh\ i\ N-h-h + k ^^N-h + k 

k=l k=l 



in the limit of large A'", under the assumption that /i,/2 scale as vN, and I3 is finite as 
^ 00. 

We begin by re- writing the expression as, 
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In the limit of large N, we obtain, 




And so, as is given in Eq. (35), we have, 

k=0 l4=0 h=0 ^ 

Now, in the limit of large A^, we have observed from simulations that the zi converge to 
a Gaussian distribution with a mean that is proportional to y/N and a standard deviation 
that is proportional to A^^/^. While this observation is not a proof, we may nevertheless 
make an ansatz that the zi do indeed converge to a Gaussian in the limit of large A^, and 
see if this allows us to solve for the steady-state of this reproduction pathway. If this ansatz 
leads to a self-consistent set of equations that may be used to solve for the steady-state in 
the limit of large A", then we may assume that it is a correct assumption. 

If the mean of the Gaussian scales as y/N, then we may write that the mean of the 
Gaussian is given by XVN. If the standard deviation of the Gaussian scales as A"^/^, then 
we may write that the standard deviation is ^N^^^. As a result, we may transform from a 
discrete representation in terms of the zi into a continuous representation, denoted by 
where x = 1/ \fN . Conservation of probability implies that zi = p{x)/\/N ^ p(x) = \fNzi. 

In these re-scaled coordinates, the Gaussian has a mean of A and a standard deviation of 
■fN~^/^. As a result, we obtain that, 

p{x) = — =e ^ (E5) 
7v27r 
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We then have, 

u=o ^ fe=o ^ 

N-xVN ^ J 1:1 11 

-={-={x ^))*e vTv^ vw^p( + ) (E6) 

Defining xi — li/y/N we have, in the hmit of large N, that, 

p{x) = 2e-^ V E Tr^'(^ - ^) X 

r - ^)'^e--e- + il^) (E7) 

Jo V-/Vx V-/V 

In the hmit of large N, we can evaluate the integral out to order 1/\/N. The idea is that 
the integrand is a product of two functions of xi, where one of the functions is a Gaussian 
that converges to a 5- function centered at A. The integral is then evaluated to order l/\fN 
by expanding the other function out to second order in xi — A, and integrating under the 
narrow Gaussian envelope. In the Taylor expansion, we may ignore any terms containing an 
x'^/y/N where n > 1, since such terms either vanish or contribute a term of order at least 
1/N. 

Following these guidelines, the integral becomes. 



N 



Nx 



dxi [1 + U 



xi — X U^U — 1) ,xi — A 



A 



1 A^V4 
[1 - x{xi - A) + -x\xi - A)2]— = exp[- 

^ 7V27r ^7 



xi — X + 



+ 



A 



(E8) 
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Defining x[ — xi — X, this becomes, 

r-i / , 2 /2i r / ^4 l r ^4 n ^ r \/ N .Q, 

[1 - xj., + -J. 1 exp[-xi;^l exp[-^-^]-^exp|-^i,, ] 



^.;,J2.aZ^exp[-||.fi 

7^ 7V27r 27^ 

AMI + -^(Kh + *) - |j - M^)l."e-- X 



J — C 



Y X A z A zA 

(E9) 



Now, instead of working with p{x) directly, we work with its Laplace Transform, P{s) = 
/o°° p{x)e~^^dx. By Taylor expanding out to second-order in x—X, we have that, to first-order 
in 1/\/N, the Laplace Transform of p{x) is given by. 



poo -1 ]\T1/4l / AT 

P{s) = e-^' / dx{l - s{x - A) + -s\x - A)^)— =exp[-^(x - A)^] 

= / [1 - 50;' + s'x"]—=exp[-^x"] 
J-oc ^ 7V27r z7^ 



= e-^\l + ^s') (ElO) 



We also have. 
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P{s) = 26-^^ y J- y ^ / dxe-^V^e-^^ X 
u=o ^ '4 fc=o '^'J 

exp[ 



7V27r 



27 



f 

J — c 



^4=0 ^ '4 



A 



2A2 



AT 



A 



A 



exp[- 



Jexp[^J — 7=exp[- 



2ViV72 



7 7V27r 
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Z—^ /J _l_ <<•, Z—c h\ 



X 



A 



2A2 



x'2)(1-Ax' + ^AV') X 



[1 + ^{\{h + k)- ^i^^ - /4(^ - A) + l7^A^ - + 



2A2 



)]x 



2A2 2^ 



^4 \ / , /I 2 ^4'-' ^4(^4 — 1) \ /2 



A; 



A;A fc2 



2A2 

iVV4 



X X 



exp[- 



N 



272 



/2l 
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1 \2l4^ °° ,,k 



oo 



^4! K + Ki, ^ k\ 

u=o ^ '■^ k=o 

1 /X/, ,\ ^4(^4 + ^) , XX 1 2x2 ,2 ^4(^4 " 1)7^ 

[1 + + i) - - ,,(- - A) + -7^A= - (47^ + - X 

Z"" ,„ ,1 2 '4S i4(i4-l) ,i4 1,2 *\ ,2, 'V''' 



[1 + 4^((A - ^)(2/4 + k)+ f (A^ - /4) + '-^^^^^^ + - A7^) - k\ 
VN A A^ A 



A ' 2 



00 



[1 + 4^((A - y)(2/4 + /.) + f (A^ - h) + + ^(^ _ ) - A/. 

VA'^ A A^ A 



Matching powers of s between the two expressions for P{s) gives, 



00 



^ kl K + Ki^ A 

(4=0 * 



00 

\2 



l = 2e-^^yi-^^!^ (E12) 



As A?" — > 00, the 1/yN term in the first equahty becomes neghgible, and so the first 
and the third equahties become identical to one another. As a result, we obtain the pair of 
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equations, 



oo 



2V^ 1 X^^^Kl 



4 



Ul K + Ki. 



oo 



(E13) 



Note that these equations are not sufficient by themselves to solve for A, 7. However, 
they show that the assumption of a Gaussian profile leads to a self-consistent set of equations 
that allows us to solve for the steady-state of the system in the limit of large N. This 
validates the analysis carried out in the main text, which leads us to the large N result of 
K = max{2e-^- 1,0}. 

2. Multi-chromosomed genome 

We may transform Eq. (47) into its continuous analogue as follows. We first note that 
the binomial probabihty distribution 2~^(^) approaches a Gaussian in the hmit of large N, 
with a mean of N/2 and a variance cr^ = N/A. Since a normalized Gaussian is given by, 

' exph^:^] (E14) 



(ta/2^ ' 2(7' 
we obtain, in the limit of large N , that. 



Therefore, we have that. 



i— j;^ -Y'e ^ zi,+i,zi,+i_i,_k (E16) 
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Defining x — 1/VN, xi — h/y/N, X4 — U/y/N, we obtain, 



A_ ArV4 / 2 h ^1±^ 1-1/2 X 

VN{x^ - xf + 2(/3 + k){x^ -x) + 
exp[ 7—1 iS!—] X 



>/IV(«n-.-!j^) 



and so, for very large A'", keeping terms up to order l/\/N , we obtain. 



(E17) 



2V^(a;i+x) \/7r(a;i+a;) 



X 



""""P^ ^ 2(xi+x) xi + x 2a/]V(xi + x)^^ VA^(xi+x) N{xi+xy^^^ 

/ (iX4[x4(xi + X - a;4)]'3(l - ^ y3g-X4(a.i+x-x-4)g-4^ ^ 

Jo ^/ N{xi-\- X — Xi) 

p(x4 H ■^)p{Xi + X — X4 



N VN 

(E18) 

Instead of working with p{x) directly, we will work with its Laplace Transform. To this 
end, we define P{s) — p{x)e~^^dx. As with the two-chromosomed genome, our strategy 
will be to take the Laplace Transform of both sides of the above equation, and expand out 
to order 1/y/N. This will provide a set of equahties that must be satisfied in the limit of 
large N, which will allow us to solve for k in the N ^ 00 limit. 
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To begin, since, in the large N limit, we are assuming that the zi converge to a Gaussian 
distribution with a mean that scales as \/N and a standard deviation that scales as A^^/*^, we 
may let Xy/N denote the mean of the distribution and ^N^^^ denote the standard deviation. 
If we switch from the / to the x representation, then the Gaussian distribution has a mean 
of A and a standard deviation of 7A^~^/^, so that we obtain, 

p{x) = —= exphv^L^] (E19) 
7V27r 27^ 

As with the two-chromosomed genome, we have, 

P(s) = e-'\l + -^s^) (E20) 
However, from Eq. (E18) we also have that. 



00 _ 



l3=0 



y. K + Ki 



E 

3 fc=0 



k\ 



dxe X 



h + k 



roo 

/ dxi(l + , 

Jo 2VN{x, + x) 



)N 



1/4, 



{k + k){xi-x) 
exp — -\ exp - 

^ xi + x ' 



71 {Xi + x) 

{I3 + kf 



exp[ 



A^(a;i - xf 
2{xi + x) 



X 



2VA^(a;i + x)' 



{k + k){xi-xf 

exp[--^ r^^-^ n:—^] X 



{k + kf{xi-x) {k + kfixi-xf 
^M- rrr/ ;T^]exp[-^ ^ ^1 x 



2{xi + xy 



^N{xi+xy 2VN{xi+xy 

r''^ dx.Mx, +X- X,)f^{l - J'^^' + ^^-.,(.,+.-.,)(^ ^ X4^) X 

Jo VN{xi + x- X4) VN 

X4 - A + exp[-^(xi + X - X4 - A - -y=f] (E21) 



7\/27r 



exp[ 



27' 



N 7\/27r 



272 



AT' 
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Now, define y — xi + x, so that, 



ls=0 ' '3 A;=0 



Jo 2y/Ny' V t^V 



2VNy M Try y 

y ' ^ 2yiV?/ ' ^ y^ 



2' , X 



^exp[-^((,-x,-Af-2Ji^-^ + -)l (E22) 
Defining x' — x — y/2 we obtain, in the hmit of large N, that. 



J-oo 2 y y^ 



iVl/^/ — exp[ 1 X 




ny y 

f dx,[x,{y - x,)ni - -^rlf^) (1 + X4^)e--(--) X 
Jo VN[y - X4) VN 

— = exp 2^4 - A)^ + 2 % ^ + ^ X 

:^-p[-^((,-.,-A) -2 ^ +^)] 

(E23) 

To evaluate the x'-integral to order 1 / Vn, we note that only terms up to x''^ will give a 
contribution that is up to order l/\/]V, and terms of order x' will integrate out to 0. The 
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integral is then, 



J-oo 1/ y ^ y T^y y 



and so, 



e; 'sU + K,, £; fc! J„ ^ 2ViV„ ' 



/3=0 '■^ A;=0 



-L ~r ( S ~ S ~r r ) ) X 

/lV^8 2 2y " 



Jo V-/V - X4) VN 



(E25) 



Defining — y — X4, we obtain that the double integral over y and X4 is, 

/ dxidx5e ' 2 1 + ^ — ^s^-^^s + x^k + k)-^ ^) x 

Jo VN 8 2 ' ' 



2 



^ exp[-^((x4 - A)^ + r^'-^'^'^' + ^)] X 



Now, defining x'^ — X4 — A, and — x^ — X, we have, in the limit of large N, that 
the integral from —A to 00 may be taken to be an integral from — cxo to 00, because of 
the narrowness of the Gaussian distribution. Also, any term containing an X4 or x^ that is 
coupled to a 1/\/N factor may have the X4 and X5 replaced with A, since for an integral 
involving x'^ or x'^ to survive, it must be on the order of at least x'^ or x'^. Since such 



83 



integrals produce a 1/VN factor or higher, the overall term is of order at least 1/N, which 
is beyond the order of the expansion we are seeking. 
We therefore have that the integral is given by, 



/ dx',dx',e-'^ (1 + ^)^^ (1 + ^)'3e-Me-^4e-44 x 

oo J-oo ^ ^ 

exp —\ exp = — exp — —\ exp = — x 



7V27r 27^ 7V27r 27^ 

To evaluate this integral, we expand the functions that do not converge to 5-functions in 
a Taylor series. Since we are only interested in terms up to order 1 / \fN , we only expand out 
to order x'^ or x'^. Furthermore, we may neglect any cross terms of x'^ and x'^. The reason 
for this is that for such terms to survive, the X4 term must be coupled to at least another 
^4 term, and similarly for the x'^ term. This produces an integral which is of order at least 
l/'\fN X = 1/N, and so may be neglected. 

The integral is then. 



/oo /*00 -j^ -j^ 



kih - 1) 



1 



A 

h 
A' 



2A2 

/3(/3 - 11 



:i + ^4 + - Ax's + ^A2<')(1 + -A + fr^') X 



7V27r 



exp[ 



2A2 



exp[ 



27 



2 
1 

2' 
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/2l 
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/2 _|_ p /-oo /-oo 1 1 

= (1 - J ^ dx',dx',{l - + + + x'i)s 

(1 + (- - A + -jxs + (_ + __ — + -A - /3 + 2A2 ^ 



ArV4 ^ ^1/4 ^ 



X , ^ , / ^3^; ^A 1 2 , /3(^3-l) ^ m 

27r 7V27r 27^ 



7- 



Going back to Eq. (E27), we then have that the overall integral is given by, 

e-^e-^^A^^l + -^[2/3(A -\)+ 7^(A^ - 2/3) + ^^&^] 



(E28) 



, 1 r 2/\ ^3x ,1 , 1 A + 7% 

("-A^-'^^+V^^^^ 



(E29) 
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Now, noting that X^felo A*'^/^' — ^'^ ^^'^ Z^felo k/J.'^/kl — //e'*, we obtain, 



+^2e-^\-^' V —^^^h^(X --)-u\ 



13=^ 

However, given that, to first order in l/\fN, we have P{s) — e~*'^(l + 7^/(2\/iV)s^), 
matching powers of s gives us that. 



2 A 1 A2'3«;,, 

X 



^3=0 ' ■ 



[1 + -^(2i3(A -\)+ 7^(A^ - 2(3) + 



,^ = (A + .V^=f:l^ (E31) 

^3=0 '3 

In the hmit of large N, the 1/y/N factor in the first equahty becomes neghgible, and so 
we obtain, 

1 A2'3«. 



;3=o '3 

= ^ (E32) 

The last equality implies that 7^ = A, and so, we have that, in the hmit of large N, the 
mean fitness R may be obtained by solving the pair of equations, 

1 A"»K,, 



00 



^ = A=(l-2e-'j:i^^:f±l) (E33) 
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As a final calculation for this subsection, we compute, in the limit of large N, the prob- 
ability that the fusion of two haploids produces a diploid with / homologous gene pairs 
lacking a functional copy of the given gene. So, suppose two haploids with n defective genes 
overlap. To determine the probability that the overlap produces a diploid with exactly I 
homologous gene pairs lacking a functional copy of the given gene, we note that, given a 
haploid, there are (") ways of placing defective genes in the other haploid so that the diploid 
has I homologous gene pairs lacking a functional copy of the given gene. The remaining n — I 
defective genes in the other haploid must be in the N — n slots where the first haploid has 
a functional copy of the gene. Since there are (^I") ways of placing these genes, we obtain 
that there are a total of {fj {^-j;) distinct haploid sequences which can fuse with the given 
haploid to produce a diploid with / homologous gene pairs lacking a functional copy of the 
given gene. Since there are a total of (^) distinct haploids having n defective genes, the 
probability that haploid fusion will lead to a diploid that has exactly / homologous pairs 
lacking a functional copy of the given gene is, 

(n\ (N-n\ -I I , 7 I ; \2 1 i l+k-2n 

iJU-J = 1 -TT (n - / + fc)^ -pj 1 + 

n n^i- N-l + k 1-^ 

\nJ k=l k=l N 



(E34) 

In the limit of large N, with n — > XVn, the above expression becomes, 

1 A-(l - !^)- = iA-[(l - - iA-e-' (E35) 
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